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METHOD  FOR  DETECTING  A  RANDOM  PROCESS 
IN  A  CONVEX  HULL  VOLUME 


STATEMENT  OF  GOVERNMENT  INTEREST 

[0001]  The  invention  described  herein  may  be  manufactured  and 
used  by  or  for  the  Government  of  the  United  States  of  America 
for  governmental  purposes  without  the  payment  of  any  royalties 
thereon  or  therefore. 

BACKGROUND  OF  THE  INVENTION 
Field  of  the  Invention 

[0002]  The  present  invention  relates  to  the  field  of  sonar 
signal  processing  and  more  particularly,  to  detecting  the 
presence  or  absence  of  spatial  random  processes  in  physical 
phenomena . 

Description  of  the  Prior  Art 

[0003]  In  some  cases,  it  can  be  important  or  critical  to  know 
with  a  high  probability  whether  data  received  by  a  sonar  system 
is  simply  random  noise  (which  may  be  a  false  alarm)  or  is  more 
likely  due  to  the  detection  of  a  vessel  of  interest.  In  either 
situation,  it  is  critical  to  make  a  determination  as  quickly  as 
possible . 
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[0004]  Naval  sonar  systems  require  that  signals  be 
categorized  according  to  structure  (i.e.,  periodic,  transient, 
random  or  chaotic) .  A  variety  of  large  sample  data  processing 
methods  such  as  spectral,  analysis,  correlogram  plots,  and  the 
like  are  available.  However,  a  number  of  scenarios  may  also  or 
only  comprise  small  samples.  These  small  samples  include  loss 
or  intermittent  contact,  transients,  equipment  failure,  own  ship 
maneuver,  and  the  like.  The  existence  of  such  sparse  data  sets 
requires  methods  that  are  appropriate  for  reliable  and  valid 
processing. 

[0005]  As  such,  there  is  a  need  for  sparse  data  set  methods 
in  which  the  methods  are  separate  from  those  methods'  which 
evaluate  large  sample  distributions.  It  is  well  known  in  the 
art  that  large  sample  methods  often  fail  when  applied  to  small 
sample  data  sets. 

[0006]  The  term  "randomness"  in  regard  to  random  noise  has 
different  meanings  in  science  and  engineering.  Random  (or  ' 
randomness)  is  herein  defined  in  terms  of  a  "random  process"  as 
measured  by  a  probability  distribution  model  -  namely  a 
stochastic  (Poisson)  process.  In  naval  engineering 
applications,  waveform  distributions  in  the  time  domain  may  be 
considered  purely  random  if  the  distributions  conform  to  a  noise 
structure  such  as  WGN  (White  Gaussian  Noise) .  This 
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determination  is  made  regardless  of  the  underlying  generating 
mechanism  that  produced  the  "noise." 

[0007]  Pure  randomness  may  be  considered  a  data  distribution 
for  which  no  mathematical  function,  relation,  or  mapping  can  be 
constructed  that  provides  an  insight  into  the  underlying 
structure.  For  example:  no  prediction  model  can  be  generated 
from  the  noise/time  waveform  in  order  to  derive  estimates  of  a 
target  range,  course,  speed,  depth,  etc.  Also,  one  must 
distinguish  the  term  "stochastic"  randomness  from 
"deterministic"  randomness  (chaos)  as  described  in  United  States 
Patent  No.  5,781,460. 

[0008]  The  theoretical  and  practical  considerations  relevant 

to  the  inventive  process  are  contained  in  the  following 
publications,  which  are  incorporated  herein  by  reference: 

[0009]  Abramowitz,  Milton  and  Irene  Stegun.  Handbook  of 
Mathematical  Functions ■  with  Formulas ,  Graphs,  and  Mathematical 
Tables.  Washington,  DC  United  States  Government  Printing 
Office :  (1964 ) . 

[0010]  Feller,  William.  Introduction  to  the  Theory  of 
Probability  and  Its  Applications.  2nd  ed.  Vol.  I.,  NY:  John 
Wiley  and  Sons  (1957). 

[0011]  Ruhkin,  A.  L.  " Testing  Randomness :  A  Suite  of 
Statistical  Procedures."  Theory  of  Probability  and  its 
Applications,  Vol.  45,  No.  1,  pp.  111-132  (2000). 
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[0012]  Preparata,  Franco  P.  and  Michael  I.  Shamos, 
Computational  Geometry  -  An  Introduction,  Springer  Verlag 
(1985)  . 

[0013]  Swed,  F.  S.  and  C.  Eisenhart.  "Tables  for  testing 
randomness  of  grouping  in  a  sequence  of  alternatives."  The 
Annals  of  Mathematical  Statistics,  14(1),  pp .  66-87  (March 
1943)  . 

[0014]  Wald,  A.  and  J.  Wolfowitz.  "On  a  test  whether  two 
samples  are  from  the  same  population."  The  Annals  of 
Mathematical  Statistics,  Vol.  11,  pp  147-162  (1940) 

[0015]  Wilks,  S.  S.  "Order  statistics."  Bulletin  of  the 
American  Mathematical  Society.  Volume  54,  Number  1,  Part  1,  pp. 
6-50  (1948). 

[0016]  The  standard  approach  for  assessing  the  hypothesis  of 
spatial  randomness  for  large  samples  is  outlined  in  the  known 
work  on  probability  theory  by  W.  Feller  (Ch.  6,  "The  Binomial 
and  Poisson  Distributions")  [Feller,  William.  Introduction  to 
the  Theory  of  Probability  and  Its  Applications .  2nd  ed.  Vol. 

I.,  NY:  John  Wiley  and  'Sons .  1957]. 

[0017]  Typically-,  from  a  frequency  table  derived  from  counts 
of  spatial  data  in  a  partitioned  subspace,  a  Chi-square  test  for 
homogeneity  of  Poisson  frequency  levels  is  computed  and  compared 
to  a  level  of  statistical  certainty.  The  Feller  reference  (pp. 
149-154),  demonstrates  the  utility  of  this  procedure  for  several 
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large  samples  of  naturalistic  data  analyzed  in  finite 
rectangular  and  circular  space.  The  noted  data  sets  include 
radioactive  decay  measurements,  micro-organism  distribution  on  a 
Petri  dish,  and  others.  However,  the  Feller  reference  provides 
little  guidance  on  the  matter  of  subspace  partitioning  including 
how  many  partitions  should  be  used  and  what  should  be  done  about 
non-whole  subset  partitions. 

[0018]  Furthermore,  most  prior  art  randomness  assessment 
methods  are  one  time  tests  designed  for  one-dimensional  or  two- • 
dimensional  space.  The  methods  are  primarily  applicable  for 
truly  random  distributions.  However,  these  quantitative 
techniques  sometimes  even  fail  to  correctly  label  truly 
nonrandom  distributions  -  as  pointed  out  by  Ruhkin  (A.  L. 

Ruhkin,  "Testing  Randomness:  A  Suite  of  Statistical  Procedures", 
Theory  of  Probability  and  its  Applications,  2000,  Vol.  45,  No. 

1,  pp.  111-132) . 

[0019]  The  following  United  States  patents  significantly 
improve  the  above-noted  situation. 

[0020]  United  States  Patent  No.  7,277,573  provides  a  multi¬ 
stage  method  for  automatically  characterizing  data  sets 
containing  data  points  in  which  are  each  defined  by  measurements 
of  three  variables  as  either  random  or  non-random.  A  three- 
dimensional  Cartesian  volume  is  sized  to  contain  a  total  number 
N  of  data  points  in  the  data  set  which  is  to  be  characterized. 
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The  Cartesian  volume  is  partitioned  into  equal-sized  cubes, 
wherein  each  cube  may  or  may  not  contain  a  data  point.  A 
predetermined- route  is  defined  that  goes  through  every  cube  one 
time  and  scores  each  cube  as  a  one  or  a  zero;  thereby,  producing 
a  stream  of  ones  and  zeros.  The  number  of  runs  is  counted,  and 
utilized  to  provide  a  Runs  test  which  predicts  if  the  N  data 
points  in  any  data  set  are  random  or  non-random.  Additional 
tests  are  used  in  conjunction  with  the  Runs  test  to  increase  the 
accuracy  of  characterization  of  each  data  set  as  random  or  non- 
random  . 

[0021]  United  States  Patent  No.  7,409,323  provides  a  method 
for  automatically  characterizing  data  sets  containing  data 
points,  which  may  be  produced  by  measurements  such  as  with  sonar 
arrays,  as  either  random  or  non-random.  The  data  points  for 
each  data  set  are  located  within  a  Cartesian  space  and  a  polygon 
envelope  is  constructed  which  contains  the  data ’points.  The 
polygon  is  divided  into  grid  cells  by  constructing  a  grid  over 
the  polygon.  A  prediction  is  then  made  as  to  how  many  grid 
cells  would  be  occupied  if  the  data  were  merely  a-  random 
process.  The  prediction  becomes  one  of  two  forms  depending  on 
the  sample  size.  For  small  sample  sizes,  an  exact  Poisson 
probability  method  is  utilized.  For  large  sample  sizes,  an 
approximation  to  the  exact  Poisson  probability  is  utilized.  A 
third  test  is  utilized  to  test  whether  the  Poisson  based  model 
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is  adequate  to  assess  the  data  set  as  either  random  or  non- 
random  . 

[0022]  As  evidenced  and  in  summary,  the  prior  art  does  not 
disclose  a  method  to  provide  a  faster  solution  with  greater 
reliability  and  for  widely  varying  sizes  of  three-dimensional 
data  sets.  The  solutions  to  the  above-described  and/or  related 
problems  have  been  long  sought  without  success.  Consequently, 
those  skilled  in  the  art  will  appreciate  the  present  invention 
that  addresses  the  above-described  and  other  related  problems. 


SUMMARY  OF  THE  INVENTION 

[0023]  It  is  therefore  a  general  purpose  and  primary  object 
of  the  present  invention  to  provide  an  improved  method  for 
characterizing  data  sets  of  physical  phenomena  such  as  sonar 
array  signals,  medical  imaging  data,  and  the  like,  as  random 
noise  or  as  containing  a  signal. 

[0024]  It  is  a  further  object  of  the  present  invention  to 
provide  a  method  for  characterizing  large  data  sets  as  well  as 
sparse  data  sets. 

[0025]  Accordingly,  the  present  invention  provides  a  method 
for  characterizing  a  plurality  of  data  sets  as  being  random 
noise  or  as  containing  a  signal.  The  method  comprises  the  steps 
of  reading  in  data  points  from  a  first  data  set  of  the  plurality 
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of  data  sets  and  then  creating  a  three-dimensional  hull  that 
encloses  the  data  points.  The  method  further  comprises  a  step 
. of  ensuring  that  the  hull  has  a  structure  that  passes  through  at 
least  four  non-coplanar  data  points  from  the  first  data  set. 
[0026]  Additional  steps  comprise  partitioning  the  three- 

dimensional  hull  into  a  plurality  of  three-dimensional  cells  and 
defining  the  first  data  set  as  being  a  large  sample  or  a  small 
sample  based  on  a  selected  parameter. 

[0027]  The  method  further  comprises ' the  steps  of  utilizing  a 

first  plurality  of  tests  for  characterizing  the  first  data  set 
as  comprising  random  noise  or  as  a  signal  when  the  first  data 
set  is  defined  as  a  large  sample  and  utilizing  a  second 
plurality  of  tests  for  characterizing  the  first  data  set  when 
the  first  data  set  is  characterized  as  a  small  sample. 

[0028]  In  one  possible  embodiment',  the  method  may  comprise  a 

step  of  partitioning  the  total  volume  V  of  data  points  into  the 
plurality  of  three-dimensional  cells  by  utilizing  at  least  one 


Vp  is  the  volume  of  the  convex  hull; 

N  is  the  number  of  points;  and  Jr  is  a  value  based  at  least 
partially  on  N. 

[0029]  The  method  may  also  comprise  the  step  of  ending  the 
testing  after  any  of  the  first  plurality  of  tests  or  if  the 
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second  plurality  of  tests  indicates  that  the  first  data  set 
comprises  the  signal. 

[0030]  In  another  possible  embodiment,  the  second  plurality 
of  tests  comprises  determining  a  significance  probability  value 
for  the  small  sample  for  a  two-tailed  hypothesis  for  a  quasi- 
symmetric  finite  discrete  Poisson  probability  distribution. 
[0031]  The  first  plurality  of  tests  may  comprise  at  least  a 
Runs  test,  a  correlation  test,  an  R  ratio  and  confidence 
interval  analysis  and  a  normal  approximations  z-test  for  a 
Poisson  distribution.  These  tests  are  completed  on  a  number  of 
non-empty  of  the  plurality  of  cells  wherein  the  first  plurality 
of  tests  are  performed  in  a  predefined  and  sequential  order. 
[0032]  The  second  plurality  of  tests  may  comprise  at  least  a 
Runs  test,  a  correlation  test,  an  R  ratio  and  confidence 
interval  analysis  and  an  exact  Poisson  distribution  hypothesis 
test  wherein  the  second  plurality  of  tests  are  performed  in  a 
predefined  and  sequential  order. 


BRIEF  DESCRIPTION  OF  THE  DRAWINGS 
[0033]  A  more  complete  understanding  of  the  invention  and 
many  of  the  attendant  advantages  thereto  will  be  readily 
appreciated  as  the  same  becomes  better  understood  by  reference 
to  the  following  detailed  description  when  considered  in 
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conjunction  with  the  accompanying  drawings,  wherein  like 
reference  numerals  refer  to  like  parts  and  wherein: 

[0034]  FIG.  1  is  a  plot  of  random  data  points  within  a  convex 
hull  in  accordance  with  the  present  invention; 

[0035]  FIG.  2  is  a  flow  diagram  for  a  method  to  characterize 
data  sets  in  accordance  with  the  present  invention;  and 
[0036]  FIG.  3A-3C  are  flow  diagrams  for  the  method  to 
characterize  data  sets  in  accordance  with  the  present  invention. 


DETAILED  DESCRIPTION  OF  THE  INVENTION 
[0037]  The  present  invention  enhances  the  likelihood  that  a 
correct  decision  is  made  in  multi-dimensional  space  for  samples 
of  varying  sizes.  The  invention  also  provides  a  method  to 
determine  whether  the  three-dimensional  data  structure  conforms 
to  a  random  process  (i.e.,  predominantly  random). 

[0038]  In  the  preferred  embodiment,  the  present  invention 
creates  a  compact  space  by  forming  a  convex  hull  around  time- 
based  measurements.  Convex  hulls  as  used  herein  are  discussed 
by  Franco  P.  Preparata  and  Michael  I.Shamos,  Computational 
Geometry  -  An  Introduction,  Springer  Verlag,  1985;  the 
discussion  incorporated  herein  by  reference. 

[0039]  As  used  herein,  the  convex  hull  of  a  set  of  points  in 
space  is  the  surface  of  a  minimum  area  with  a  convex  (outward) 
curvature  that  passes  through  all  the  points  in  the  set.  In 
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three  dimensions,  the  set  must  contain  at  least  four  distinct, 
non-coplanar  points  to  make  a  closed  surface  with  a  nonzero 
enclosed  volume. 

[0040]  Typically,  a  convex  hull  in  the  volume  will  occupy 
about  thirty-five  to  sixty  percent  less  space  than  the  space 
needed  for  containing  a  rectangular  solid  -  such  as  proposed  in 
United  States  Patent  No.  7,277,573.  Generally,  the  larger  the 
sample  size  then  the  smaller  that  this  difference  becomes.  As 
such,  a  major  advantage  of  the  present  invention  is  a  more 
compact  region;  meaning  less  processing  time.  This  lessoned 
processing  time  is  especially  noticeable  for  Smaller  measurement 
sets. 

[0041]  A  sequenced  set  of  randomness  assessment  tools  tests 
the  randomness  hypotheses.  The  testing  is  conducted  in  a 
sequenced  multi-stage  paradigm  with  built-in  protocols  for 
detecting  aberrant  data  structures.  A  flexible  mix  of  known 
parametric,  nonparametric  and  correlational  testing  procedures 
is  selectable  for  similar  problems  in  military  and  commercial 
environments . 

[0042]  In  one  embodiment  of  the  invention,  a  streamlined 
decision  module  functions  on  an  "all  or  nothing"  principle.  In 
another  embodiment,  an  operator  has  the  option  of  ceasing 
randomness  assessment  upon  one  (or  more)  instance (s)  of  a  non- 
random  testing  result.  This  approach  maximizes  the  likelihood 
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of  a  correct  decision  in  a  shorter  period  of  time  and  minimizes 
the  chance  of  an  incorrect  decision  regarding  the  signal-noise 
hypothesis.  This  approach  also  reduces  unnecessary  data 
processing  time  when  searching  for  a  signal  classification  in 
the  observed  noise-dominated  data. 

[0043]  Table  1  reflects  the  structure  of  the  data  sets  that 
this  invention  evaluates  in  analysis  subsystems. 


Table  1 

Typical  Data  Set  of  a  Time-Series  in  Three-Dimensional  Cartesian 

Space  For  N  Measurements 


Time 

(t) 

Measurement 

(y) 

Measurement 

(z) 

f0 

yo 

zo 

y\ 

Z1 

* 

. 

tN— 1 

>N-1 

ZN-1 

[0044]  It  is  noted  that  Time  (t)  may  be  replaced  by  a  non¬ 
temporal  continuous  variable  X. 

[0045]  These  inputted  time  series  (or  non-time  series)  data 
of  unknown  structure  are  first  enveloped  in  a  convex  hull.  The 
solid  polygon  shape  of  the  hull  is  'then  partitioned  into  a 
predetermined  number  of  three-dimensional  cubic  cells  showing  a 
dependent  variable  x  (typically  clock  time) .  FIG.  1  depicts  a 
convex  hull  10  enclosing  a  pseudo-random  data  set  of  fifty  time- 
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data  points  (x)  [shown  as  labeled  item  12]  with  randomized 
amplitudes  of  forty  and  thirty  units  (y,  z)  ,  partitioned  into 
sixty  cubic  cells.  The  reduced  observation  space  of 
approximately  fifty  percent  is  notable  and  is  the  key  to  a 
faster  solution.  After  the  waveform  is  .enclosed  with  a  convex 
hull,  then  the  convex  hull  is  partitioned  into  three-dimensional 
partitions  (as  indicated  by  partitioning  lines  14) . 

[0046]  Following  this  method,  a  noise-free  hull-enclosed 
helix  can  be  determined  to  have  a  signal  with  a  high  degree  of 
certainty.  Other  input  waveforms  which  comprise  data  points, 
such  as  a  hull-enclosed  elliptic  parabaloid,  are  also  found  to 
have  a  signal  with  a  high  degree  of  certainty. 

[0047]  Exemplary  partitioning  methods  are  explained  as 
follows : 

[0048]  Method  1:  The  first  method  employs  an  algorithm  that 
accounts  for  the  length  of  each  axis  and  identifies  how  many 
points  are  used  to  determine  an  ideal  number  of  cubes  to 
partition  the  total  volume  V.  Taking  the  cube  root  of  the  value 


found  ( k )  , 


will  'give  the  length  of  the  side  of  each  cube. 


The  value  k  is  the  optimal  number  of  partitions  of  cubic 
subspace  as  described  in  United  States  Patent  No.  6,980,926 
(O'Brien).  Beginning  at  zero,  the  axes  are  partitioned 
depending  on  the  length  of  the  side. 
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[0049]  Method  2:  The  second  method  uses  the  formula  \ - , 

V  N 

where  V  is  the  volume  of  the  hull  and  N  is  the  number  of  points 
P 

used  to  find  the  length  of  each  side.  Beginning  at  zero,  the 
axes  are  partitioned  according  to  this  value. 

[0050]  Method  3:  The  third  method  uses  the  formula 

Beginning  at  zero,  the  axes  are  partitioned  according  to  this 
value.  See  FIG.  1  for  this  partitioning  model. 

[0051]  Method  4:  The  fourth  and  final  method  uses  the  same 

formulae  but  also  eliminates  excess  space  around  the  hull.  This 
method  identifies  that  at  least  one  point  on  the  face  of  the 
convex  hull  must  be  tangent  to  the  y-z  plane  of  the  containing 
region  x,y,z.  Another  alternative  deletes  or  minimizes  non¬ 
whole  cubic  subspaces.  The  fourth  method  is  preferred  as 
affording  the  tightest  possible  envelope  of  an  input  waveform. 

Large  Sample  Testing  Procedures 

Method  A  (Wald-Wolfowitz  Independent  Sample  Runs  Test  Procedure) 
[0052]  An  initial  statistical  test  on  input  distributions  is 

performed  to  evaluate  the  time-series  structure  of  individual 
data  sets.  The  Runs  test  is  a  non-parametric  combinatorial  test 
that  assesses  a  randomness  hypothesis  for  a  two-valued  data 
sequence  and  is  well  known  to  those  skilled  in  the  art  [Wald,  A. 
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and  J.  Wolfowitz. 


"On  a  test  whether  two  samples  are  from  the 


same  population."  Ann.  Math.  Stat . ,  Vol .  11,  pp  147-162, 

(1940) ] . 

[0053]  The  Runs  test  has  been  previously  applied  in  the  art 
to  spatial  distributions.  The  test  is  attractive  because  it  can 
be  applied  for  spatial  randomness  in  small  or  large  samples  with 
exact  probabilities  when  assumptions  of  parametric  testing 
procedures  are  not  met.  The  novel  utility  in  three  dimensions 
was  initially  demonstrated  for  a  rectangular  and  solid  envelope 
in  United  States  Patent  No.  7,277,573  (O'Brien). 

[0054]  In  the  Runs  test,  the  procedural  steps  for  a  convex 

hull  that  are  partitioned  into  cubic  subspaces  are  as  follows: 
[0055]  Step  1.  Assign  a  value  of  "0"  or  "1"  to  respectfully 
indicate  a  cell  as  empty  or  non-empty.  The  assignment  should  be 
identified  separately  from  the  number  of  points  in  a  cell  or 
cell  size.  Subsequently,  count  the  number  of  runs  in  the 
observation  space  of  the  volume  in  the  same  manner  specified  in 
United  States  Patent  No.  7,277,573  for  a  three-dimensional  data 
set.  A  run  (also  known  as  a  "clump")  is  a  countable  sequence  of 
at  least  one  consecutive  and  identical  outcome.  For  the  present 
invention,  a  run  is  a  sequential  and  homogeneous  stream  of 
assigned  0  or  1  data  followed  by  a  different  sequential  and 
homogeneous  stream  of  0  or  1  data. 
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[0056]  Arbitrarily  label  the  total  number  of  1  data 
identifiers  by  n  and  the  total  number  of  0  data  identifiers  by 

z?2  .  For  example  and  for  the  following  data  exhibit:  n  =  eight 

1  data  identifiers  and  =  thirteen  0  data  identifiers.  The 


total 

sample  size  is  n 

=  n\ 

+  ,  and  six 

000 

11  00000 

1111 

00000  11 

1 

2  3 

4 

5  6 

k 

J 

Y 

r  = 

six 

total  runs 

[0057]  Here,  the  sample  shows  r  =  six  runs  (out  of  greater 

than  200,000  combinations)  which  may  be  tested  for  randomness. 

A  sample  of  ordered  binary  data  (1/0),  corresponding  to  the 
behavior  of  the  amplitudes  of  the  time-series  may  show  too  few 
or  too  many  runs  to  be  attributable  to  mere  chance  variation. 
This  sample  indicates  deterministic  signal  information  which  may 
be  extracted  in  detecting  or  tracking  objects  in  an  ocean 
environment.  Alternatively,  the  number  of  runs  may  be  in 
accordance  with  the  laws  of  probability;  thereby,  indicating  a 


16 


mere  chance  fluctuation  in  the  behavior  of  the  time  series 
distribution.  This  fluctuation  is  indicative  of  random  noise. 
[0058]  Step  2.  In  a  distribution  that  is  truly  random,  an 
expected  or  average  number  of  total  runs  E(r )  is  given  by  the 
derived  relationship: 


2/7, 77, 


E(r)  = 


+  1 


n\+n2 


(1) 


[0059]  Step  3.  The  variance  or  spread  in  the  number  of  runs 
of  a  random  sample  is  computed  as: 


2  2  n1n2(2n1n2-nl-n2) 

(n\  +"2)  2  (flj  +n 2  -l) 


(2) 


[0060]  Step  4a.'  For  large  samples,  to  statistically  assess 

the  relationship  of  the  total  sample  number  of  runs  r  in 

9 

dimensional  space  to  the  distributional  moments,  E(r)  ,cr“ ;  the 
sample  statistic  r  is  submitted  to  a  Gaussian  normally 
distributed  test  statistic  z  (with  a  mean  0  and  a  variance  1) : 


z  =  - — f==-~  (n  ,n  >10) 
2  1  2 


(3) 
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[0061]  Step  4b.  Compute  the  significance  probability  p  of  the 
observed  result  from  the  continuous  standard  Gaussian  (normal) 


distribution : 

p  =  Pr  {Z  <  -z)+  Pr  ( Z  >  z) 

=  1  - [Pr(Z <  z)-Pr  (Z <  -z)] 

+  |z|  ,  >. 

=  1 — -j=  J  exp  [  -  .5x“  \dx, 
V2m_|z|  v  J 

-  oo  <  \z\  <  +  oo,  0  <  p  <  1 


(4) 


where  •]  means  an  absolute  value  and  Pr  is  probability.  In  one 

embodiment,  a  continuity  correction  factor  of  -0.5  may  be  added 
to  the  absolute  value  of  the  numerator  for  small  samples  where 


/7p  /?2  ^  10  . 

[0062]  The  p  value  is  the  probability  of  detecting  noise. 
Another  interpretation  is  that  p  represents  the  impression  that 
the  null  hypothesis  of  random  noise  is  true.  Small  values  of  p 
lead  to  rejection  of  the  null  hypothesis  of  noise. 

[0063]  For  example,  in  the  case  of  pure  noise,  z  =  0  in 
Equation  (3)  and  p  =  1  by  Equation  (4) .  In  the  case  of  a  pure 

signal,  ±!z|— »±co  and  p  =  0  by  Equation  (4)  .  The  calculation  of 

p,  well  known  to  those  skilled  in  the  art,  is  performed  in  a 
standard  finite  series  expansion. 

[0064]  An  estimate  of  the  p  value  is  provided  for  both  the 
large  and  small  sample  testing  procedures.  This  approach 
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streamlines  the  evaluation  process  to  a  simple  comparison  of  p 
against  the  a  priori  false  alarm  rate  "a". 

[0065]  Step  4c.  If  the  sample  is  small  ( 77  ,/?,,<  10  )  ,  save  the 

and  r  values  in  memory  and  proceed  to  Step  5.. 

[0066]  Step  5.  Calculate  the  p  value,  either  for  the  z 

statistic  by  Equation  (4)  or  for  small  samples.  The  cumulative 

probability  for  computed  sample  runs  r  is  determined  by 
computing  the  probability  of  obtaining  a  quantity  Pr(r<r')  —  the 

likelihood  of  obtaining  that  many  runs  or  less  in  a  random 
sample . 

[0067]  To  obtain  the  two-sided  equivalent  for  non-directional 
hypotheses,  the  above  probability  is  doubled  to  obtain  the 
composite  significance  probability,  p  =  P  r  (r  <  r')  +  P  r  (r  >  r').  The 

probability  of  runs,  conditional  upon  r  being  an  even  or  odd 
number,  is  provided  by  the  following  combinatorial  ratios  [see 
Wilks,  S.  S.  " Order  statistics"  Bulletin  of  the  American 
Mathematical  Society.  Volume  54,  Number  1,  Part  1,  pp .  6-50, 
(1948) ]  . 

[0068]  For  the  case  of  r  EVEN  point  probability 


19 


(5) 


'n.-lYn 


Pr  (r  =  2k)  =  2- 


1 

k- 1 


A^"1  J 


'n\+n2 
nl  J 


>k  =  1,  2,...,  n 


where  k  is  found  from  r  =  2k  and 


a\ 


b)  b'.(a-b)\ 


is  the  binomial 


coefficient  in  combinatorial  notation. 


[0069] 


For  the  case  of  r  ODD  point  probability 


Pr  (r  =  2k-\)  = 


n,  -1 


77  2  -  1 

k- 1  , 


77,  -1 

k  - 1 


77  j  -  1 

k 


^77,  +  772  ^ 


"5  ^  ...j  73 2 


where  k  is  found  from  r  =  2k-\,  and 


is  as  above 


(6) 


[0070]  The  total  cumulative  probability  for  a  two-sided 

alternative  is  p  =  Pr  (r  <  r')  +  Pr  (r  >  r')  and  is  derived  by  summing 

the  point  probabilities  above.  The  cumulative  probability  value 
p  is  obtained  in  accordance  with  the  process  specified  in  Swed 
and  Eisenhart  F.  S.  Swed  and  C.  Eisenhart.  "Tables  for  testing 
randomness  of  grouping  in  a  sequence  of  alternatives."  The 
Annals  of  Mathematical  Statistics,  14(1): 66-87,  (March  1943). 
[0071]  For  cumulative  probability,  r  EVEN  or  ODD: 
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Pr  (/■</■')  = 


r 


<n  > 


n\  i 

1 J 


I 

a=i 

r+ 1 


A-  — 1 


V 


n2  - 1 
k-1 


nj  +  n-,  =  n;  r  =  2k  (  r  EVEN) 


1  2 

k  Z 

k  =  2 


l  n\  i 

IV  ij 


u-  iY^-n 


^-2; 


r-.-iy-2-o 


^-2y 


A--1 


,  =  n\  r  =  2k  -  \ 


( r  ODD) 

(7) 


[0072]  The  above  cumulative  probability  values  must  be 

doubled  to  assess  the  non-directional  hypothesis  specified 
below.  The  cumulative  probability  value  p  is  the  most 
important  datum  to  employ  in  the  decision  rule  for  this 
subsystem  testing. 

[0073]  The  standard  statistical  practice  will  be  used 

throughout  the  present  method.  The  rule  specifies: 

p  >  a  =>  Noise 
p  <a  =>  Signal  +  Noise 

where  a  is  the  false  alarm  rate  which  is  typically  set  at  the 
five  percent  level  or  lower  and  p  is  the  significance 
probability  of  the  observed  calculation.  The  Probability  of 

False  Alarm  (PFA)  is  defined  as,  a  =  Pr  {reject  Hq \Hq  =  True')  or  the 

probability  that  the  null  hypothesis  (NOISE)  is  rejected  when  it 
is  true. 


[0074]  In  the  decision  truth  table  of  signal  detection 
theory,  this  is  considered  to  be  the  most  serious  error  for  this 
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noise  processor  system.  That  is,  when  the  call  is  SIGNAL  but 
NOISE  is  dominant,  then  this  is  a  serious  error.  This  results 
in  unnecessary  processing  time  and  is  an  error  which  must  be 
minimized. 

[0075]  The  value  of  a  represents  the  percentage  of  time  that 
a  wrong  decision  will  be  made  (for  example:  the  error  of 
rejecting  a  null  hypothesis  when  actually  true) .  Obviously, 
minimizing  this  type  of  significant  error  is  a  substantive 
factor  in  the  present  method.  Correlatively ,  minimizing  a  also 

maximizes  l-a  ,  defined  as  Pr  [Accept  Hq|Hq  =  True') ,  which  amounts  to 

calling  noise  correctly.  "True"  indicates  that  the  distribution 
is  truly  random.  If  a=five  percent,  this  confidence 

probability  1  —  a  is  approximately  ninety-five  percent. 

[0076]  For  a  hypothesis  text,  the  non-directional  or  two- 
tailed  binary  hypothesis  set  is: 


HQ  :r=  E(r) (NOISE  ONL  Y) 

Hj  :r*  E(r) (SIGNAL  +  NOISE) 


[0077]  The  distribution 'is  labeled  NOISE  if  p>  a,  where  a  is 
the  false  alarm  rate.  Otherwise,  the  presence  of  a  signal  is 
most  likely  indicated  by  this  system  subtest. 
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[0078]  For  the  interpretation  of  a  significant  outcome 

(SIGNAL  +  NOISE) ;  if  r  is  significantly  lower  than  the  expected 
value  E(r)  ,  this  implies  a  grouping  or  clustering  of  measurements 
(for  example:  a  periodic  function  produced  by  rotating  or 
reciprocating  machinery) .  Other  possible  forms  may  include 
parabolic  and  helical  surface  functions. 

[0079]  If  r  is  significantly  higher  than  the  expected  value 
E{r) ;  this  implies  a  repeated  and  alternating  pattern  in  the 
measurements.  It  should  be  noted  that  the  null  hypothesis  of 
"noise  only"  is  analogous  to  the  hypothesis  of  "NO  TARGET"  in 
signal  detection  theory  and  the  opposite  is  analogous  to 
"TARGET". 

[0080]  As  an  example  of  the  calculations  for  this  important 
module  of  the  subsystem  assessment  protocol;  assume 

=  8;  /?2  =  21  (n  =  29)  and  r  =  6  (there  are  over  four  million  possible 

runs  combinations  for  this  sample) .  The  data  may  be  analyzed  by 
the  two-tailed  probability  method  and  by  the  approximate 
Gaussian  distribution  method. 

[0081]  Since  the  number  of  runs  is  even,  the  probability  of 

this  many  runs  or  less  from  Equation  (7)  is: 
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r 


Pr  (r  <  6)  = 


¥20 


k- 1 


k- 1 


=  .0019 


(8) 


and  the  doubled  two-tailed  probability  is  p  =  .0038. 

[0082]  The  small  sample  normal  approximation  method  with 
continuity  correction  factor  is  provided  from  Equation  (3) : 


r-m  I-- 


r 


=  2.91 


(9) 


and  the  p  value  from  Equation  (4)  is: 


i  +|2'9)|  /  \ 

=  Pr  (Z  <  2.9l)=  \--j=  Jexp  [-.5x^  J.dx  =  . 


0036 


(10) 


-|2.91| 


[0083]  Each  method  gives  almost  identical  results  at  a  high 

degree  of ‘ precision .  If  the  false  alarm  rate  is  .05  or  .01, 
then  p  <  a  =i>  Signal  +  Noise.  Also,  since  r  <  E(r),  the  data  indicates 

that  mechanism-producing  periodic  motion  is  suspected. 

[0084]  It  is  noted  that  the  Runs  test  has  shown  high  power 

(call  signal  correctly)  to  detect  input  signal  time  waveforms. 

In  one  experiment,  the  Runs  test  quickly  detected  a  signal  for  a 
fifty  point  hull-enclosed  elliptic  paraboloid  with  the  detection 
having  a  high  degree  of  probability. 
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[0085]  The  significance  probability  was  p  =  .000050834  by 
Equations  (3)  and  (4) ,  which  represents  the  likelihood  that  this 
waveform  is  actually  random  noise.  Since  the  number  of  runs  was 
observed  to  be  far  less  than  expected;  this  indicated  a  strong 
structural  grouping  or  clustering  of  measurements  (for  example: 
a  periodic/parabolic  or  a  helical  surface  function) . 

[0086]  Method  B  ( R  Ratio) 

[0087]  A  prior  art  measure  useful  in  the  interpretation  of 

outcomes  is  the  R  ratio.  The  R  ratio  is  defined  as  the 
observed-to-theoretical  expected  occupancy  rates  in  partitioned 
space : 

[0088]  (ID 

k*§ 

[0089]  where  "m"  =  the  observed  number  of  cells  occupied 
(non-empty)  in  partitioned  space/  "k"  =  the  number  of  spatial 

partitions,  and  S=l-e  “  ,  a  Poisson  measure  specifying  the 
probability  that  a  partition  is  non-empty  in  a  sample  and  the 
proportion  of  cells  expected  to  be  non-empty  in  a  random 
distribution.  k*  3  is  the  Poisson  mean  number  of  non-empty 
partitions . 

[0090]  The  range  of  sample  values  for  R  indicate:  R  <1 

(clustered  distribution) ;  R  «  1  (random  distribution) ;  and 
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R  >  1  (uniform  distribution)  .  The  minimum  R  is  R  .  =  1 k§  and 

mm  ' 

the  maximum  R  is  ^max  =  ^ /k$  ,  where  N  is  sample  size. 

[0091]  The  R  ratio  is  graphed  as  a  linear  function  in  a 
sample  for  1 <  in  <  N .  This  measure  is  used  in  conjunction  with 

prior  art  methods  in  deciding  whether  to  accept  or  to  reject  a 
randomness  hypothesis. 

[0092]  An  R  ratio  between  0.90  -  1.10  is  indicative  of  noise. 

Outside  of  .that  range;  a  signal  waveform  should  be  suspected. 

For  highly  skewed  distributions  (k  being  much  larger  than  N)  ;  a 

signal  structure  is  suspected  when  R'Z.R 

max 

[0093]  The  R  ratio  is  a  heuristic  measure  only  in  that  no 
probability  bands  of  confidence  are  associated  with  the  computed 
value.  The  interpretation  of  gathered  results  should  merely 
confirm  or  deny  a  random  process  when  read  in  conjunction  with 
the  results  derived  from  previously  developed  probability  and 
statistical  analyses.  Latter-described  Method  D  provides  a 
statistical  assessment  of  the  R  ratio  and  a  method  to  determine 
a  ninety  percent,  ninety-five  percent  or  ninety-nine  percent 
confidence  band  for  R.  This  capability  significantly  expands 
prior  art  methods. 

[0094]  Along  with  the  correlation  module  measures- in  Module 
C,  the  R  ratio  test  is  a  second  measure  for  detecting  readings. 
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In  operational  use,  the  calculation  of  the  R  ratio  should  be 
embedded  in  the  .testing  procedure  for  Method  D. 

Method  C  (Correlation  Module) 

[0095]  The  use  of  a  multiple  linear  correlation  R  for  1 

criterion  —  or  dependent  variable  (usually  time  t  and  c 

predictors  or  independent  variables  which  are  measurements 

coincident  with  time)  of  sample  size  N  is  one  measure  employed 

to  correct  the  paradox  mentioned  above  in  respect  to  randomness 

assessment  test  readings  that  provide  false  results  for 

deterministic  multivariate  functions.  The  range  is  OSR  <1 

t*y,z 

where  values  near  0  indicates  randomness.  This  statistical 
measure  will  help  detect  threats  to  the  integrity  of  the  method 
for  a  class  of  linear  functions.  The  likelihood  that  a  correct 
decision  is  made  will  be  enhanced  and  lessens  the  likelihood 
that  an  incorrect  decision  will  be  made  in  regard  to  "signal" 
vs.  "noise". 

[0096]  The  squared  multiple  correlation  index  for  predictors 

{Y  and  Z)  is  derived  from  the  ordinary  least  squares 
minimization  technique  and  can  be  expressed  as  a  weighted  sum: 
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Rl 


P  r.  o  +P  rt<5 
y  ty  y  rz  tz  z 


t»y,  z 


<y 


0  <  R 


r2  +  r2  -2  r.  r.  r 
ty  tz  ty  tz  y z 


1  -r}x 

<1;  -1< 


t»y,  z 


r  ,r.  ,r 
ty  tz  yz 


<  +  l 


(12) 


where  B  and  3  are  the  beta  weights,  r  ,  r  and  r  are  the 
y  z  ty  tz  yz 

linear  correlations  between/among  time.  The  measurements, 

a  and  a  are  the  standard  deviations  and  a  t  is  the  time 

y  z 

variance . 

[0097]  The  driving  factors  in  Equation  (12)  are  the  zero- 

order  intercorrelations  ( r  and  r  )  of  the  amplitude  measures 

ty  tz 

with  time.  If  the  amplitude  measures  are  random,  no  systematic 
relationship  should  exist  in  the  time  domain.  This  leads  to  an 
overall  composite  multiple  correlation  approaching  zero  which  is 
a  situation  that  is  indicative  of  noise. 

[0098]  This  correlation  function  is  known  to  those  skilled 

in  the  art.  The  multiple  R  is  tested  for  a  difference  from  0 
(randomness)  by  the  statistical  F  (variance-ratio)  distribution 
(with  c  and  N  -  c  -  1  degrees  of  freedom)  using  the  following 
distributional  relation: 
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(13) 


R2  , 

t»  y,z  N-c- 1 


1-R 


2 

t»y,z 


c 


~  F{c,  N  —  c  —  l) 


[0099]  If  is  approximately  zero,  it  can  be  concluded 

that  the  data  conforms  to  a  random  distribution.  The  hypothesis 
set  is  typically  two-tailed. 

[0100]  In  the  present  invention,  c  is  representative  of  two 

independent  variables.  ( Y,  Z) .  The  significance  probability  p  is 
obtained  by  direct  evaluation  of  the  distribution  density  F. 

That  is, 
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(14) 
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R 


Where 


c,  v2  =  A/-3  ,  F= 


t*y,z 


1  -  R 


and  r  (•)  is  the  complete 


t*  y,z 


gamma  function. 


[0101]  The  value  p  is  interpreted  by  comparison  to  the  false 
alarm  rate  (for  example:  p  >  a  =>  Noise  ;  otherwise, 
p  <  a  =z>  Signal  +  Noise)  . 

[0102]  For  example,  in  one  typical  pseudo-random  data  set 

analyzed  with  fifty  measurements,  ,  =  .0424,  p  » 0.96  (NOISE)  by 

Equation  (14)  . 

[0103]  In  order  to  recognize  the  minimum  value  that  the 

multiple  correlation  can  achieve  for  the  noise  hypothesis  to  be 
rejected,  R  is  solved 


R.  >V1-P  2 

t*y,z 


(15) 


where  p  functions  as  the  a  priori  false  alarm  rate  a  .  For 


example:  if  N  =  50  or  V-,  =47  and  p  is  set  to  .05;  then  a 
R  >  0 .346  correlation  is  needed  to  suspect  a  non-noise 
distribution  .with  a  Probability  of  False  Alarm  (PFA)  being  five 
percent.  Any  value  less  than  0.346  suggests  the  data  to  be 
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random  with  this  false  alarm  rate.  Using  a  one  percent  PFA, 
then  R  >  .422  is  needed  to  suspect,  a  non-random  waveform.  The 
minimum  R  value  in  Equation  (15)  is  inversely-related  to  a 
sample  size  for  a  fixed  p  value. 

[0104]  In  addition  to  the  linear  multiple  correlation,  the 

present  method  specifies  computing  a  discrete  normalized 
Autocorrelation  Function  (ACF)  indices  for  one,  two  and  three 
time-lags  or  alternatively  more  (depending  on  sample  size  N)  if 
the  multiple  linear  R  shows  noise. 

[0105]  Autocorrelation  is  the  cross-correlation  of  a  signal 

with  itself.  The  measure  is  designed  to  detect  repeating 
patterns  in  nonlinear  time-series  distributions  (e.g.,  periodic, 
quasi-periodic,  parabolic,  etc.). 

[0106]  Whereas,  the  linear  measure  R,  will  detect  a 

t»y,z 

linear  trend  relationship  in  time;  the  linear  measure  will  not 
necessarily  detect  nonlinear  relationships  among  the  amplitude 
measurements.  The  autocorrelation  function  will  better  detect 
such  nonlinear  trends  which  other  testing  procedures  may 
mislabel  as  noise. 

[0107]  As  provided  below,  the  method  computes 

autocorrelations  for  two  models  (first  with  Y £  as  a  dependent 

variable  and  then  with  Zt  as  a  dependent  variable) .  Table  2 
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illustrates  the  structure  for  an  autocorrelation  analysis  of  1, 
2,  3  -lags,  with  a  Yt  dependent  variable. 


Table  2. 

Model  Illustrating  Three-lag  Autocorrelations 
(Amplitude  Y £  as  a  dependent  variable) 
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(5) 

(6) 

T 

Yt 

Zt 

%t+2 

^t+3 

Time 

depend 

indep 

indep 

indep 

indep 

Lag-1 

Lag-2 

Lag-3 

to 

yo 

20 

Zl 

Zl 

23 

tl 

yi 

Z\ 

Zz 

Zl 

Z4 

• 

• 

• 

• 

• 

tN-4 

Yn-4 

ZN-4 

2n-3 

Zn-2 

Zn-1 

tN-3 

yN-3 

Zn-3 

2n-2 

Zn-1 

tjtf-2 

Yn-2 

2n-2 

ZN-1 

tN-1 

yN-i 

Zk-i 

[00100]  This  procedure  amounts  to  computing  successive 
multiple  linear  correlation  indices  (0</?<l)  between  the 

dependent  variable  Yt  and  the  time  lags  of  Zt+1,  %t+2  anc*  Zt+3, 

and  other  lags,  analogous  to  the  structure  in  Equation  (12) . 
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That  is  the  autocorrelation  of  lag-1 


R 


it 


*Vzt+i 


(column  2  with 


column  3  and  4);  the  autocorrelation  of  lag-2  R 

y t  *  Zt’  Zt  +  2 

(column  2  with  column  3  and  5) ;  and  the  autocorrelation  of  lag-3 
R  (column  2  with  column  3  and  6) .  Additional  lags  are 

yt**r*t+3 


computed  in  a  similar  fashion. 

[0108]  If  the  data  are  random,  the  simple  Jc-lag  zero-order 
intercorrelations  of  the  dependent  variable  K  with  the 

amplitude  measures,  r  ,  will  be  zero  driving  the  R  value 

yt'Zt  +  k 

towards  0  in  the  multiple  autocorrelation  formula  for  all  lag 
lengths . 

[0109]  The  second  modeling  approach  requires  treating  Zt  as 


the  dependent  variable  as  shown  in  Table  3  for  an  exemplary 
three-lag  analysis. 


Table  3 

Model  Illustrating  Three-lag  Autocorrelations 
(Amplitude  Zt  as  a  dependent  variable) 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

t 

zt 

Yt 

Y  t+1 

Yt+2 

Yt+3 
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Time 

depend 

indep 

indep 

Lag-1 

indep 

Lag-2 

indep 

Lag-3 

to 

yo 

yi 

y2 

ys 

tl 

Zi 

yi 

y  2 

ys 

y4 

• 

• 

• 

• 

• 

• 

ZN-4 

yN-4 

yN-3 

yw-2 

yn-1 

tN-3 

2N- 3 

yw-3 

Yn-2 

yN-i 

£n-2 

Zn-2 

Yn-2 

yw-i 

tN-l 

Zn_i 

y»- 1 

• 

[0110]  The  reverse  correlation  procedure  amounts  to 

computing  the  multiple  linear  correlation  index  between  the 


dependent  variable  Z £  and  the  time  lags  of  Yt+1  Yt+2  anc^  ^t  +  3, 


analogous  to  the  structure  in  Equation 
lags:  the  autocorrelation  of  lag  -  1 


(12)  .  That  is  for  three 


R 


zt'yvyt+i 


(column  2  with 


column  3  and  4);  the  autocorrelation  of  lag  -  2  R 


zt*yt^yt+2 


(column  2  with  column  3  and  5);  and  the  autocorrelation  of  lag  - 


3  R 


Zt'yt’yt  +  3 


(column  2  with  column  3  and  6) .  Additional  lags 


are  computed  in  a  similar  fashion.  As  noted  above,  randomness 
occurs  when  the  dependent  variable  does  not  correlate  with 


the  simple  lag  correlations,  (For  example: 


r  « 0). 

z  •  v  . 
t  J  t  +  k 
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[0111]  The  reason  for  this  more  complicated  approach  for 
three-dimensional  autocorrelation  analysis  is  two-fold.  First, 
in  a  closed  system  of  data  inputs,  one  cannot  specify  the 
dependent  variable  a  priori  in  a  meaningful  manner  or  one  cannot 
know  the  exact  nonlinear  mathematical  structure  of  the  waveform 
to  be  detected  (periodic,  parabolic,  etc.).  Second,  the  linear 
R  and  the  k- lag  autocorrelations  will  show  R.  « R  » 

tmy'z  yt*zrzt+k 


R 


zt*yt'yt  +  k 


«0  for  random  noise.  However  for  signal  waveforms, 


it  is  not  necessarily  true  that  for  a  k-lag  autocorrelation. 


R 


=  0  and  R 


=0.  At  least  one  model  is 


yt*zrzt+k  zt*yt'yt+k 

expected  to  detect,  signal  structure  for  nonlinear  forms . 

[0112]  This  approach  will  enhance  the  likelihood  that  a 

signal  waveform  will  be  detected  and  not  inaccurately  be  labeled 
as  noise  when  used  in  conjunction  with  the  Runs  test  and  other 
analysis  procedures  of  the  present  method. 

[0113]  When  a  three-dimensional  data  set  is  not  a  time 

series;  correlational  analysis  can  be  complicated  in  real-time 
experiments  in  which  a  rapid  yes-no  classification  is  required. 
An  example  of  this  circumstance  is  when  a  causal  model  is  not 
present  to  specify  the  dependent  and  independent  variables  in 
order  to  detect  and  classify  the  three-dimensional  functional 
form. 
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[0114]  Many  variable  relational  techniques  are  known  to  those 
skilled  in  the  art,  including  multivariate  nonlinear  curve 
fitting,  partial  correlation,  canonical  correlation  analysis, 
pattern  recognition,  image  processing,  feature  extraction,  and 
other  multivariate  data  reduction  techniques..  These  are  large 
sample  methods  requiring  significant  analyst  input  to  determine 
the  interpretation  of  outcomes .  The- present  method  focuses  on 
time  series  analyses  with  potentially  sparse  data  sets  in  real¬ 
time  operating  systems  in  which  a  rapid  classif ication  of 
noise/signal  is  required  for. unknown  waveforms. 

[0115]  The  hypothesis  of  no  relationship  (i? » O)  in  time 

series  data  will  be  resolved  by  comparing  the  observed 
autocorrelation  R  values  for  N  discrete  sample  points  against 
the  approximate  standard  error  on  a  correlogram  (For  example: 
accept  the  noise  hypothesis  if  the  autocorrelation  measure  lies 


between  the  critical  values  of  the  white  noise  band. 


0</?< 


1.96 
VN  ' 


for  a  five  percent  false  alarm  rate) .  If  R  falls  outside  the 
ninety-five  percent  band;  a  signal  structure  is  suspected. 
Approximately  five  percent  of  autocorrelations  fall  outside  the 
band . 


[0116]  The  random  wave  form  in  FIG.  1  depicts  the  following 
autocorrelation  co-efficients:  Lag  1:  Ry,  .  .  Zt^Zt+i  =  0.152; 


Lag  Ryt,‘-t/Zt+2  =  0.154;  Lag  3:  Ry  .Zt^Zt+3  0.113. 
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All  indicate  white  noise,  since  the  minimum  intensity  for 
deciding  "signal"  is  0.277  with  fifty  measurements  at  a  five 
percent  false  alarm  rate. 

[0117]  As  a  rationale  for  this  two-step  correlational 

procedure;  assume  a  noise-free  parabolic  function  in  two-space 


f(t)  =  y  =  9  —  t  ,  plotted  for  t  ±  3  (  seven  data  points).  The  linear 

relationship  between  t  and  y,  r ,  is  zero  which  indicates  noise. 


[0118]  Thus  far.,  the  method  indicates  a  random  distribution 

for  a  simple  deterministic  function.  However,  the 
autocorrelations  show  an  increasing  intensity  in  relationship 


(For  example:  r 


=  +  .36 ;  r 


=  -.48;  r 


yt*yt  +  l  yt,yt  +  2  yt,yt  +  3 


=  -.84;  the 


fourth  lag  shows  r 


yfyt  +  4 


=  -.96,  and  the  fifth  lag  shows 


r  ,  =-1.00).  The  parabolic  signal  structure  is  revealed  by 

yt  yt  +  5 

the  successive  serial  correlations.  With  each  lag,  the  plot  of 
y  with  yt+k  becomes  more  linear  (inversely) . 

[0119]  In  this  case,  the  first  three  lags  are  likely 
sufficient  to  indicate  a  signal  structure.  Such  a  change  cannot 
be  observed  with  random  noise  regardless  of  sample  size  or  the 
number  of  time  lags  since  the  correlations  will  fall  within  the 
boundaries  of  the  critical  white  noise  band. 
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[0120]  By  contrast,  in  the  case  of  a  simple  circle  function 

in  two-space,  x  +  y  =  9,-3<x<+3  (twelve  data  points),  the 
method  of  the  Runs  procedure  quickly  detects  a  signal  waveform 
at  a  high  level  of  certainty.  The  autocorrelations  increase  in 
intensity  only  by  the  fourth  and  fifth  lag.  These  examples 
demonstrate  that  such  experiments  require  a  flexible  mix  of 
testing  procedures  in  order  to  arrive  at  a  correct  time  waveform 
classification. 

[0121]  The  autocorrelation  analysis  (with  Yt  and  Zt  as 

separate  dependent  variables)  detects  signals  in  similar  three- 
dimensional  algebraic/geometric  and  periodic  functional  forms 
which  might  otherwise  be  mislabeled  as  noise. 

Method  D  (Normal  Approximation  z-Test  for  Poisson  Distribution 
Based  on  the  Number  of  Non-Empty  Partitions) 

[0122]  This  testing  procedure,  derived  from  the  Central 

Limit  Theorem,  is  used  for  evaluating  the  following  binary  non- 
directional  hypothesis  set  regarding  the  number  of  cells  that 
are  non-empty  in  a  partitioned  volume  as  compared  to  the 
expected  number  or  mean  in  a  random  distribution: 

Hg :  k  0  =  m  (NOISE) 

Hj  :  k0  *  m  (SIGNAL  +  NOISE) 
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The  normal  approximation  Poisson  z  test  statistic  takes  the  form 


m  -  kS- 


~  M(0, 1) 


(16) 


where  N  (0,  1)  indicates  the  z  measure  is  an  approximate  and 

normally-distributed  random  variable  with  a  mean  0  and  a 
variance  1  in  samples  of  more  than  twenty-five  measurements. 

[0123]  As  discussed  in  Method  B,  the  quantity  0  is  the 
probability  that  a  cell  is  non-empty  in  a  random  distribution 
population  and  the  quantity  k 0  is  the  mean  or  the  average 
number  of  non-empty  partitions  in  a  stochastic  random 
distribution . 

[0124]  Since  the  population  parameter  0  is  rarely  known,  the 
sample  Poisson  measure  is  used,  0  =  1  —  exp  (—  X  t)  ,  where  X  t  is  the 

average  number  of  points  per  partition.  The  quantity,  k3 ,  is  the 
sample  mean  or  the  average  number  of  partitions  expected  to  be 
non-empty  in  a  spatial  random  sample.  The  sample  measure  m  is 
the  actual  number  of  k  partitions  which  are  non-empty.  These 
quantities  are  defined  further  by  using  a  Poisson  frequency 

analysis  notation  (as  described  in  the  note  for  Table  4). 

[0125]  As  discussed  earlier,  the  operator  would  compare  the 
value  of  z  against  a  probability  of  false  alarm  a.  The 
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significance  probability  p  of  the  observed  value  z  is  calculated 
with  Equation  (4) : 


p  =  Pr(zz\z\)=  1— —  jexp  \-.5x2)  dx, 

V^TX  |^|  (17) 

-oo<|z|<  +  oo,  0<p<l 

where  •  is  an  absolute  value.  The  decision  protocol  below  is 
adopted: 

p  >  oc=>  Noise' 
p  <  a  =>  Signal  +  Noise 

It  is  seen  that  if  m~k&,  or  R » 1,  then  z«0  and  p«l  (noise). 

[0126]  Based  on  Equation  (16) ,  a  ninety-five  percent  or 
ninety-nine  percent  confidence  interval  (Cl)  can  be  constructed 
for  the  point  estimate  m  when  Hq  is  true  (noise);  that  is,  to 

determine  the  range  of  m  which  is  indicative  of  noise/signal .  A 
ninety-five  percent  confidence  interval  (  Cl ^ )  is  obtained  by 

solving  for  m  in  Equation  (16) ,  written  as  an  algebraic 
probability  statement: 

CIgAm)  =  k&-1.96y[j&  <  m<  M+  1.96 4k&  , 
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or 


(18) 


Cl  (/n)  =  k&  ±  1.9 6VM 

where  ±1.96  is  the  critical  value  of  the  Gaussian  distribution 
when  the  false  alarm  rate  is  five  percent  for  a  two-tailed 
hypothesis . 

[0127]  Equation  (18)  represents  the  range  that  m  can  vary 
when  a  distribution  is  random  with  a  ninety-five  percent 
certainty.  A  ninety-nine  percent  Cl  is  derived  in  a  similar 
manner;  the  ±1.96  is  changed  to  ±  2.576  ;  for  ninety  percent  use 

±  1.645  . 

[0128]  The  lower  limit  can  be  k$  —  1 ,96y[k§  on  m  or  m.  and 

k$  +  1.96jk$  as  the  upper  limit  m  This  allows  a  useful  measure 

‘  m 

in  terms  of  the  intuitive  R  ratio,  - . 

k$ 

[0129]  For  example,  if  N=k  =  25,  then  k9  =  25  [l  — exp(— l)]  =  15.8 

cells  non-empty  on  the  average  in  a  random  distribution.  A 
ninety-five  percent  Cl  for  m  is  (m)  =  kQ  ±  1.96rJkQ  =(8,24), 

rounded.  Translated  into  the  R  ratio  in  terms  of  mt  and  m 


R  . 

noise 


Wu' 

K  k§  ’  k&  j 


rk$- L96VM  k&  +  L96y[k{P 

k$  ’  k$  . 


«1±.49«(0.51,  1.49), 


(19) 
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or  in  general, 


R  .  =1±-t=  (ninety-five  percent  confidence) 

noise 


which  represents  the  lower  and  upper  boundary  on  the  R  ratio  of 
a  random  distribution.  Replace  1.96  with  1.645  or  2.576 
respectively  for  ninety  percent  and  ninety-nine  percent 
confidence. 

[0130]  If  N  =  k  =  \0~',  R  .  =  1  ±  .07  which  shows  that  the  range 

noise 

narrows  as  the  sample  size  increases  until,  in  the  limit  as 

N  —>  oo,  R  .  ->•  1.0  (pu  re  noise ) . 

noise 

[0131]  One  further  measure  that  may  be  of  use:  if  N ~  k,  a 


2.1 

quick  result  is:  R  .  «l±—=  (ninety  percent  confidence);  or 

noise  J 

2.5  ,  , 

R  .  » 1  ±  —p=  (ninety-five  percent  confidence);  or 

noise 


3.2 

R  .  «  1  ±  —r=  (ninety-nine  percent  confidence) 

noise  4k 


[0132]  Note  that  for  highly  skewed  distributions  (k  being 
much  larger  than  N)  ,  the  value  of  the  maximum  R  should  be 
obtained  (United  States  Patent  No.  7,409,323  demonstrates  this 
methodology).  A  signal  structure  is  suspected  when  R>Rmax. 
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[0133]  These  analyses  demonstrate  the  usefulness  of  the  R 

ratio,  R  =  —~,  when  coupled  with  a  confidence  interval  analysis. 
k& 

It  is  a  fast  randomness  assessment  technique  as  well  as  a  means 
to  detect  the  paradox  of  deterministic  relations  being  declared 
random . 

[0134]  As  mentioned  above,  the  calculation  of  the  R  ratio 

should  be  embedded  within  the  testing  procedure  in  Method  D. 

The  tests  are  described  separately  for  narrative  purposes  only. 

Method  E  (Chi-square  Test  of  Homogeneity  -  An  Alternative) 
[0135]  The  Chi-square  test  is  used  to  decide  if  the  Poisson 

distribution  is  adequate  to  model  a  random  process. 


Table  4 . 

.  POISSON  ANALYSIS 

Frequency  Table  Protocol  and  Definitions 


(1) 

(2) 

.  (3) 

(4) 

(5) 

k 

nx 

k- n , 
k 

P  ( k\Xt ) 

7V-P(£;?a) 

0 

no 

0 

P(0;  it) 

/V-P(0;  Xt) 

l 

nl 

nl 

P  (l;  kt) 

jV-P(l;  Xt) 

• 

.  • 

• 

• 

• 

K 

n  K 

K-nK 

P{K;kt) 

/V  •  P(K;A,f) 
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TOTAL 


N 

T 

1.00 (approx. ) 

N  (approx. ) 

Zk  =  A" 

,  .  n,  (zero  bin  is  calculated  last) .  (19) 

k  =  l  k 

[0136]  V  is  the  computed  volume  of  the  convex  hull  polygon, 

partitioned  into  N  cubes; 

[0137]  k  is  an  index  indicating  an  empty  cell  (£  =  0)  ,  cells 

with  one  point  (k  =  1)  ,  etc.; 

[0138]  K  is  the  number  of  categories  of  k; 

[0139]  n ^  is  the  frequency  count  associated  with  k  (NOTE:  n ^ 

<  5  must  be  combined  with  an  adjacent  cell  to  ensure  n^>5.  The 

value  K  is  adjusted  accordingly) ; 

K 

[0140]  N=  is  the  number  of  partitions  (cells);  wherein 

k  =  0A 

N  is  defined  as  k  in  the  procedures  of  Method  D; 
m 

[0141]  T  =  is  the  total  number  of  points  in  a  sample 

k- 0 

(known  a  priori) .  In  small  samples,  T  may  not  equal  the  input 
sample  size. 
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K 

Hk-nk 

rjy 

[0142]  Xt  «  —  - «  —  is  the  average  occupancy  rate  or 

Znk 

k=  0 

points  per  cell  for  the  entire  distribution.  The  relationship 
t=——  is  the  average  volume  of  the  cubic  subset  partitions. 

{kt)k 

[0143]  P(k;Xt)  =  exp  (-It)-  -  -  ,  Pois  son  probability 

k\ 

K 

distribution  function.  ^^P(k',  Xt)  =  Pr (k  —  0;  A,f)  +  Pr  (k>  0;  Xt)a  1 

k  =  0 

Pr  (k>0;  Xt)  =  l  -exp(?d),  probability  that  a  cell  is  non-empty.  The 
probability  Pr(^r  >  0;A,t)  =  1  -exp  (if)  is  re-defined  as  9  in  Equation 
(8),  (16)  and  (20)  .  When  this  value  is  multiplied  by  the  total 

number  of  partitions  N,  the  result  provides  the  Poisson  mean 
referred  to  as  k.9  which  is  an  important  measure  for  Method  B  and 
Method  D  (large  and  small  sample) . 

[0144]  n  ,  «  N •  P{k\Xt)  in  a  random  distribution. 

K 

K 

•  PfcXt) 

k  =  0 
K 

m—  ^77,  — !1q  is  the  total  number  of  non-empty  partitions. 

k  =  0 
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[0145] 


m 


R  = 


yV-Pr  (A>0;Xt) 


«1  in  a  completely  random  distribution 


[R  Ratio  of  Equation  (8) ] . 

[0146]  The  R  ratio  is  therefore  defined  in  each  notation 


system  as: 


/?  =  ■ 


K 

k  =  0 


n. 


m 


K 


2X  ?t(k  >0;Xt) 


N 


k  =  0 


1  -  exp 


(  T_ 
Nj 


m 


[0147]  k$  is  the  mean  or  expected  number  of  non-empty 

partitions.  This  number  is  an  important  measure  used  in  the 
Module  B  and  Module  D  testing  procedures. 

[0148]  Chi-square  statistic  for  homogeneity  test  (with  K-2 


degrees  of  freedom) :  %  =  ^ 


5  k->) 


,  where 


k  =  0  k 


°k  =  nk;ek=N-p(k^t)- 


[0149]  The  Chi-square  test  for  homogeneity  is  performed  on 

the  observed  sample  frequencies  and  expected  random  noise 
Poisson  theoretical  frequencies. 


Method  F  (Non-Linear  Correlation  -  An  Alternative) 

[0150]  An  alternative  correlation  function  is  the  eta  7  (or 

nonlinear  correlation)  coefficient  in  time-series  analyses. 

This  measure  is  known  to  provide  the  maximum  correlation 
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possible  between  the  time  criterion  and  any  function,  linear  or 
nonlinear  combinations,  of  the  predictors.  The  correlation 
ratio  is  always  at  least  as  large  as  the  linear  correlation. 

2  2 

For  example:  r\  >R  .  The  range  is:  0<r|  <1. 

t»y,z  t*y,z  t*y,z 

Other  alternative  correlational  measures  are  indicated  in 
correlation  Method  C  for  non-time  series  data  sets. 

Example  of  Analysis  Procedures 

[0151]  In  one  exemplary  experiment,  a  sample  of  fifty 

pseudo-random  data  points  was  assigned  to  125  cells  of 
partitioned  subspace  within  a  convex  hull  of  total  volume 

1  3 

V  =516—  contained  in  a  10  region  of  measurement  amplitude 

V 

P 

(-^-  =  .517  —  forty-nine  percent  reduction  in  observation  space). 

[0152]  The  frequency  analysis  of  at  least  partially  of  these 

fifty  points  is  shown  below  in  accordance  with  the  definitions 
and  properties  as  previously  described  with  Table  4;  Poisson 
Analysis.  Table  5. 


Poisson  Frequency  Analysis 


(1) 

(2) 

(3) 

(4) 

(5) 

k 

°k 

k-n , 
k 

P{k\  Xt) 

N-P(k;Xt) 
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0 

86 

0 

.  6811 

85.14 

1 

30 

30 

.2616 

32.69. 

>2 

9 

18 

.  0502 

6.28 

TOTAL 

N  =  125 

GO 

II 

.  9929 

124.11 

[0153]  Overall,  the  frequency  data  in  Table  5  is  reasonably 
dispersed  and  indicates  that  the  Poisson  model  is  adequate  to 
model  the  three-dimensional  random  data  enveloped  in  a 
partitioned  convex  hull.  The  difference  between  data  in  Column 
(2)  compared  to  the  data  of  Column  (5)  is  the  primary 
comparison.  The  Chi-square  homogeneity  value  for  model  fit  is 
1.41  (v=K- 2=1  degrees  of  freedom)  with  a  probability  of  p  =  .49 
(noise  distribution) .  The  p  value  was  obtained  by  direct 


evaluation  of  the  integral  for  the  Chi-square  density, 


/ 


similar  to  the  approaches  used  to  compute  Equations  (4)  and 
(14) .  Thus,  the  Poisson  distribution  is  adequate  to  model  the 
data  as  a  random  process  embedded  in  a  three-dimensional 
polygon . 

[0154]  Other  analyses  that  can  be  obtained  from  Table  5  and 

the  raw  data  of  fifty  measurements  can  show  that  the  number  of 
non-empty  cells  amounts  to  m  =  39  ( N  —  n0 )  .  The  expected  average 
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number  in  a  random  distribution  is  125 


=;  39.86  and 


1  -  exp 


48  v 
125, 


the  R  ratio  is  — 1 - =.98,  which  is  very  close  to  pure  randomness. 

39.86 

[0155]  In  the  alternative,  the  actual  sample  size  of  fifty 

points  may  be  used  to  carry  out  the  computations  if  the 
alternative  Module  E  is  not  used  as  part  of  the  present  method. 
[0156]  The  confidence  analysis  procedure  indicates  that  a 

ninety-five  percent  noise  band  for  the  data  is  (.69,  1 .3 1)  for  the 
R  ratio  which  contains  the  observed  R.  Likewise,  a  ninety-five 
percent  Cl  for  m  is  (27,  52),  rounded.  With  ninety-five  percent 

certainty,  the  m  and  R  values  are  expected  to  fall  within  these 
ranges . 

[0157]  Moreover,  from  this  data,  the  z  test  procedure  of 

Method  D  can  be  applied  which  shows  a  p  value  of  .89  (noise) . 

Here  the  value  kS  =  39.86  .  The  Runs  test  returned  a  p  value  of 
.84  (noise) 

[0158]  Since  partitioning  is  irrelevant  to  the  correlation 

measure,  the  multiple  linear  correlation  was  obtained  in  fifty 
simulation  runs  with  an  average  correlation  of  =  .154  [a  p 

value  of  .57  (noise)].  The  noise  variance  accounts  for 
approximately  ninety-eight  percent  of  the  total  variance. 
Autocorrelations  also  indicate  random  noise. 
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[0159]  This  is  an  exemplary  technical  analysis  for  the  data 

set  in  accordance  with  the  present  method.  Each  procedure  is 
similar  in  result  in.  that  the  data  are  noise  with  a  high  amount 
of  certainty  as  compared  to  a  false  alarm  rate  of  five  percent 
and  less. 

[0160]  The  above  data  analysis  results  are  comparable  to  the 

results  disclosed  in  the  method  for  a  rectangular  solid  in 
previously-referenced  United  States  Patent  No.  7,277,573.  The 
difference  is  a  forty-nine  percent  reduction  in  observation 
space  with  the  use  of  convex  hull  which  translates  into  a 
solution  obtained  in  approximately  half  of  the  time. 

Small  Sample  Testing  Procedures  (Measurements  <  25) 

[0161]  Testing  Method  A,  Method  B  and  Method  C  are  also 

applicable  to  a  small  sample  case. 

[0162]  The  R  ratio  of  Method  B  should  be  viewed  as 
descriptive  rather  than  inferential  when  applied  to  small 
samples.  The  suggested  guideline  is  0.90  <  R  <  1 .10  =>  NOISE  ; 

otherwise,  SIGNAL  R  >  Rmax  =>  SIGNAL  [highly  skewed  distributions 
-  k  being  much  larger  than  N]  . 

[0163]  The  correlation  of  Method  C  presents  a  statistical 
problem  for  small  samples.  The  ability  to  reject  the  null 
hypothesis  (noise)  depends  on  the  sample  size.  A  high 
correlation  computed  on  a  small  sample  size  may  be  insufficient 
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to  reject  the  null  hypothesis  for  purely  statistical  reasons. 

For  this  reason,  a  heuristic  procedure  in  the  interpretation  of 

the  linear  and  autocorrelation  measure  is  stated  in  the 

following  decision  rule: 

R  <  .32  =>  NOISE 
R  >  .32  =>  SIGNAL  +  NOISE 

where  R  refers  to  either  the  multiple  linear  correlation  or  the 
autocorrelation  measures  of  any  lag  length  for  Y £  and  Zt  as 

dependent  variables.  A  correlation  of  .32  translates  into  a  ten 
percent  signal  variance.  Other  users  of  the  method  may  choose 
different  cut-off  values  (twenty  percent,  twenty-five  percent 
signal  variance,  etc.)  but  the  recommended  level  appears  useful 
for  application  to  time  series  or  other  in  situ  distributions. 
The  level  can  be  adopted  to  the  results  of  operational  testing. 


Method  G  (Exact  Poisson  Distribution  Hypothesis  Test) 

[0164]  This  testing  procedure  is  the  small  sample  analogue  of 
the  normal  approximation  test  in  Method  D.  The  procedure 
provides  more  accurate  estimates  of  probabilities. 

[0165]  Based  on  the  Poisson  point  process  theory  for  a 
measurement  set  of  data  in  a  time  interval  At  with  corresponding 
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measurements  of  magnitudes  AY  and  AZ ;  that  data  set  is 

considered  to  be  purely  random  if  the  number  of  partitions  k  is 
non-empty  (containing  no  observable  measurements)  to  a  specified 
degree.  The  observed  number  of  non-empty  partitions  is  m,  as 
defined  in  the  Note  of  Table  4  with  'follow-on  supporting 
language  -  to  be  referred  to  as  Equation  (19)  hereinafter  for 
all  uses  of  the  Note  of  Table  4.  The  mean  or  expected  number 
of  non-empty  partitions  in  a  random  Poisson  distribution  is 
given  by: 

k&  =  k(  (20) 

where  0  is  the  probability  that  any  cell  is  non-empty  in  a 
completely  random  Poisson  distribution  and  Xt  is  the  population 

parameter  of  the  spatial  Poisson  process  defined  in  Equation 
(19)  corresponding  to  the  average  number  of  points  observed 
across  all  three-dimensional  subspace  partitions  in  a  random 
distribution.  A  spatial  Poisson  process  is  assumed  to  govern 
the  mechanism.  The  calculation  of  Equation  (20)  comes  directly 
from  the  standard  Poisson  distribution  function  given  in 
Equation  (19) . 

[0166]  In  one  embodiment  of  the  present  method,  the  spatial 
mean  Xt  is  calculated  from  a  frequency  distribution  of  Poisson 
distributed  variables  in  the  manner  recommended  by  Feller,  Ch. 

6.  [Feller,  William.  Introduction  to  the  Theory  of  Probability 
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and  Its  Applications.  Vol  I.,  NY:  John  Wiley  and  Sons.  (1957)]. 
This  quantity  was  defined  in  Equation  (19) . 

[0167]  The  sample  value  of  Xt  corresponds  to  the  average  "hit 

rate"  or  average  number  of  points  across  all  cubic  whole  and 
part  subspaces  of  the  convex  hull  in  three  dimensions  and  t  is 
the  volume  of  a  cubic  partition.  This  situation  was  defined  in 
Equation  (19)  . 

[0168]  An  alternative  to  calculate  Equation  (20)  is  a 
standard  manner  for  evaluating  a  two-tailed  hypothesis  for 
finite  discrete  probability  distributions.  However,  no  known 
way  is  provided  for  calculating  the  significance  probability 
value  p  for  two-tailed  hypotheses  —  as  was  done  for  the 
symmetric  infinite  Gaussian  distribution.  If  the  Poisson  mean 
kS  >  10,  the  Poisson  distribution  can  be  approximated  by  the 
Gaussian  distribution  with  the  mean  and  variance  of  the  original 
Poisson  distribution.  At  this  level,  the  Poisson  distribution 
becomes  more  symmetric  about  the  mean. 

[0169]  A  quantized  continuity  correction  factor  of  plus  or 
minus  .5  is  applied  to  the  computations  since  a  discrete 
distribution  is  approximated  by  a  continuous  one.  The 
identifier  p  can  then  be  calculated  fairly  accurately.  This  was 
the  rationale  for  Equation  (16)  as  derived  from  the  Central 
Limit  Theorem. 
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[0170] 


For  small  Poisson  means,  the  distribution  is  skewed  so 
this  Gaussian  approximation  is  not  calculable  and  the  p  value 
cannot  be  provided.  This  is  the  situation  for  which  the 
probability  test  was  designed  for  the  present  invention.  The 
need  for  this  probability  test  (for  example:  very  small  samples) 
will  rarely  occur. 

[0171]  A  derived  algorithm  of  Equation  (26)  provides  an 
estimate  of  the  significance  probability  p  for  evaluating  a  two- 
tailed  hypothesis  for  the  quasi-symmetric  finite  discrete 
Poisson  probability  distribution.  It  is  validated  against  the 
calculated  p  values  for  the  large  sample  z  test  described  in 
Equation  (16) . 

[0172]  The  boundary,  above  and  below  the  Poisson  mean  k 0, 
attributable  to  random  variation  and  controlled  by  a  false  alarm 
rate,  is  the  critical  region  of  the  test.  In  practice,  there  is 
no  knowledge  of  the  population  parameter  0  and  the  functional 
parameters  of  that  measure.  Rather,  sample  observations  <9  and  k 
are  worked  with  and  compared  to  the  frequency  structure  against 
a  theoretical  probability  distribution  which  models  random 
noise . 

[0173]  In  essence,  this  exact  probability  test  determines  if 

the  observed  number  of  non-empty  cells  m  is  contained  within  the 
boundaries  of  the  theoretical  Poisson  model  expectations  which 
is  a  situation  indicative  of  random  noise  for  the  three- 
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dimensional  data  set.  If  m  falls  in  the  critical  region;  a 
signal  waveform  is  suspected. 


Lower  Boundary  Value  of  Critical  Region 

[0174]  The  test  procedure  is  described  in  detail  below, 
example  is  provided  to  clarify  each  step. 


[0175]  Let  Vj  be  the  integer  quantity  forming  the  lower 
boundary  of  the  sample  mean  AS  given  by  the  Poisson 


a 


0 


Pr(K<^)< — — ,  min 
criterion:  2 


a  N 
a  ao 


An 


Where 


Pr  (V<y)=  £p(.k;AS), 
y  =  0 


(21) 


[0176]  Pr  is  probability  and  P(y; AS)  is  the  discrete  Poisson 
probability  distribution  function  given  as: 

P(y,m  = - - 

00 

where  ^  P(y;  AS)  =  1 .0 

y  =  0 

[0177]  The  upper  limit  on  the  summation  is  finite  in 

practice;  selected  such  that  the  summation  achieves  a 
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predetermined  level  of  convergence  (for  example:  the  sum  is 


approximately  0.99999). 

[0178]  The  quantity  OCq  is  the  probability  nearest  to  an 

exact  value  of  the  pre-specif ied  false  alarm  probability  a  and 


a 


is  the  largest  value  of  y  such  that  Pr  (v <  y)  <  .  It  is  an 


objective  to  minimize  the  difference  between  a  and  oc,q  .  The 

Probability  of  False  Alarm  (PFA)  is  typically  set  at  five 
percent. 


Upper  Boundary  Value  of  Critical  Region 

[0179]  The  upper  boundary  of  the  Poisson  probability  test  is 
called  y1  and  is  determined  in  a  manner  similar  to  that  for 

determining  the  lower  boundary  value  y ^  . 

[0180]  Let  y0  be  the  integer  quantity  forming  the  upper 
random  boundary  of  the  mean  k $  given  by: 


a 


Pr(Y>y)<-y-,  min 


(X  1 

a__0 

2  2 

V  J 


(23) 


where 


Pr  (Y>y)=l-  £P(r,  k&) 
y=0 
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[0181] 


The  value  0Cq  is  the  probability  closest  to  an  exact 


value  of  the  pre-specif ied  false  alarm  probability  a  ,  and  yn  is 


a 


the  largest  value  of  y  such  that  Pr  (V >  y)<—^~.  It  is 


an 


objective  to  minimize  the  difference  between  a  and  (Xq. 

[0182]  Hence,  the  subsystem  determines  if  the  frequency 

structure  contains  a  "y"  amount  of  observed  points  within  the 
critical  region;  thereby,  warranting  a  determination  of  random 
(otherwise,  nonrandom  is  the  call)  with  the  associated  PFA  a 
being  wrong  in  the  decision  when  random  is  the  analysis  test 
result  (See  the  discussion  for  Table  A) . 

[0183]  The  overall  protection  against  a  Type  I  error  —  a  —  is 
the  sum  Pr^'^y^+Pr  ( Y>y7 )  which  is  often  higher  (as  previous 
research  has  indicated) .  This  value  is  also  known  as  the  actual 
level  of  significance. 


HYPOTHESIS 

[0184]  In  the  hyphothesis,  the  subsystem  assesses  and 

evaluates  the  random  process  binary  hypothesis  for  small  samples 
by  means  of  the  sample  proportion  of  non-empty  partitions: 


H0  :©  =  -9  (NOISE) 

Hj  :0 *  S  (SIGNAL  +  NOISE) 
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DECISION  RULE 


y^<  y  <  y  2  =>  Noise 

y^  ^  y  ^  y^  =>  Signal  +  Noise, 

where  y  =  m  (the  number  if  non-empty  cells  are  in  a  sample) . 
[0185]  The  false  alarm  rate  is  set  at  a  -  .05  for  very  small 
samples.  For  (At  <  1 6) ;  a  may  be  set  to  0.10. 

[0186]  As  mentioned,  the  quantity  0  is  the  unknown  population 
parameter  representing  the  probability  that  a  cell  is  non-empty 
in  a  completely  random  Poisson  distribution  and  $  is  the  sample 

value  indicated  above  S=l-e~“  . 

[0187]  Note  that  the  actual  Poisson  hypothesis  test  uses  the 
calculated  mean  k&  (referred  to  as  the  Poisson  mean  //  or  A  in 
probability  distribution  tables)  to  carry  out  the  calculations 
for  assessing  the  hypothesis  set.  This  practice  is  done  for 
convenience  since  the  probability  9  is  a  small  quantity 
(o<a<i)  in  finite  samples  which  gives  a  restricted  range  of  the 
Poisson  probability  distribution  integer  count  parameter  y  in 
the  Poisson  probability  function  P{y\  k&) . 

[0188]  In  practice,  one  does  not  possess  a  priori  knowledge 
of  the  population  parameters;  therefore,  sample  spatial  data  is 
compared  against  a  known  probability  function  which 
characterizes  the  structure  of  a  random  distribution.  The 
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Poisson  distribution  is  known  to  model  a  random  process  for 
distributions  on  the  line  interval  and  in  the  partitioned  plane. 

Example  of  Testing  Procedure 


[0189]  Because  the  principle  of  the  exact  Poisson 


probability  test  is  the  same  regardless  of  the  sample  size,  the 
testing  procedure  is  illustrated  for  the  data  previously 
analyzed  (see  Table  4) .  The  following  summary  data 
($  =  .319;  k&  =  39.86;  m  =  39)  are  required  to  carry  out  the  computations 
and  to  arrive  at  a  reasonable  decision  for  the  signal-noise 
hypothesis . 

[0190]  As  previously  defined,  " m "  is  the  observed  number  of 
non-empty  partitions;  i9  =  .319  is  the  probability  that  a  cell  is 
non-empty,  and  k&  is  the  mean  or  expected  number  of  non-empty 
partitions  in  a  random  distribution. 

[0191]  From  this  data,  the  two-tailed  hypothesis  set  is: 


//Q:0  =  .3 19  (NOISE) 

H- 0  *  .3 1 9  (SIGNAL  +  NOISE) 


[0192]  This  input  generates  the  discrete  Poisson  distribution 
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Table  6;  Poisson  Probability  Distribution,  for  k9  39.86 


y 

Prob 

Cuiu% 

y 

Prob 

Cuin% 

16 

0.001% 

0.002% 

43 

5.382% 

72.388% 

17 

0.002% 

0.004% 

44 

4.875% 

77.263% 

18 

0.005% 

0.009% 

45 

4.318% 

81.581% 

19 

0.010% 

0.019% 

46 

3.742% 

85.323% 

20 

0.021% 

0.040% 

47 

3.173% 

88.496% 

21 

0.039% 

0.079%* 

48 

2. 635% 

91.131% 

22 

0.071% 

0.150%  • 

49 

2.143% 

93.275% 

23 

0,123% 

0.272% 

50 

1.709% 

94.983% 

24 

0.204% 

0. 476% 

51 

1.335% 

96.319% 

25 

0.325% 

0.801% 

52 

1.024% 

97.342% 

26 

0.498% 

1.300% 

53 

0.770% 

98.112% 

27 

0.736% 

2.036% 

54 

0.568% 

98.680% 

'2  8 

1.047% 

3.083% 

55 

0.412% 

99.092% 

29 

1.440% 

4.523% 

56 

0.293% 

99.385% 

30 

1.913% 

6. 435% 

57 

0.205% 

99.590% 

31 

2.459% 

8.894% 

58 

0.141% 

99.731% 

32 

3.063% 

11.958% 

59 

0.095% 

99. 826% 

33 

3.700% 

15.657% 

60 

0.063% 

99.889% 

34 

4.337% 

19.995% 

61 

0.041% 

99.930% 

35 

4 . 939% 

24 . 934% 

62 

0.027% 

99.957% 

36 

5.469% 

30.403% 

63 

0.017% 

99. 974% 

37 

5.891% 

36.294% 

64 

0.010% 

99.984% 

38 

6.179% 

42.473% 

65 

0.006% 

99.991% 

39 

6.315% 

48.789% 

66 

0.004% 

99.995% 

40 

6.293% 

55.082% 

67 

0.002% 

99.997% 

41 

6.118% 

61.200%  * 

68 

0.001% 

99.998% 

42 

5.896% 

67.006% 

69 

0.001% 

99.999% 
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[0193] 


Based  on  the  PFA  of  .05;  the  critical  boundaries  are 


computed  to  be  =  27  and  =  53.  These  values  are  virtually  the 

same  as  found  for  the  ninety-five  percent  Cl  for  the  large 
sample  approximate  z  test  (27,  52)  in  Method  D.  These  values 
also  provide  evidence  that  the  computations  are  consistent  as 
well  as  evidence  that  the  Gaussian  z  test  is  adequate.  Since  m 
=  39  and  Y\< 39 <  y^;  then  the  null  hypothesis  of  noise  only  is 

accepted.  The  number  of  non-empty  cells  in  the  one  hundred  and 
twenty-five  partitioned  space  is  consistent  with  a  Poisson 
random  distribution. 

[0194]  The  protection  against  a  Type  I  error  is  found  by 
calculating  the  sum  of  Pr (y <  y})+  Pr  ( V>y2 )  =  .02036  +  .01888  =  .039 
which  is  approximately  twenty-five  percent  higher  than  the  a 
priori  value  of  or  =  .05.  This  significance  level  represents  the 
actual  probability  of  incorrectly  labeling  this  waveform  signal. 
The  difference  1- [Pr  (K<  jr,)+ Pr  (F>  y2)]  =  .961  is  the  confidence  that 
the  operator  has  when  deciding  that  noise  is  the  correct 
decision.  For  example:  Pr  (  Noise1  Noise  )  .  Considering  the  data 

from  the  exact  Poisson  hypothesis  test,  the  noise-only  decision 
is  reasonable.  The  results  are  consistent  with  the  large-sample 
method  of  Method  D  but  provide  a  higher  degree  of  confidence. 

Alternative  Testing  Procedure  for  the  Signal-Noise  Hypothesis 
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(Estimate  of  p) 


[0195]  In  the  case  of  the  continuous  Gaussian  distribution, 

the  p  value  from  the  large-sample  z  test  in  Equation  (16)  is 
relatively  easy  to  compute  because  of  the  mirror  symmetry  of  the 
distribution  about  the  mean.  Expressing  the  derivation  of  the 
Gaussian  p  measure  in  conceptual  terms  of  areas  where  z  is  the 
assumed  calculated  value  in  Equation  (4) : 


By  definition, 


1  -  z  l  00 

p=vsr 

-00  z 


—  ,5xz  dx 


i,  rearranging 
(  0 


V 


dx 

) 


r  j  0  j  z  A 

7 —  fexp(-.5x~  )  dx  +  7 —  fexp|-.5x~ 

V2tt  J  V  V 

k  -z  0 

'  _z 

.5000 — fexp(-.5x2)  dx 
v  2tt  j 

-oo  y 

z  -z 

7=  fexp  (-  .5x2  )  dx  — ■==■  fexp  (-  ,5x2  ) 

I2n  J  ■  '  V2tt  J  V  ' 

—  00  —  00 

? <z)-Pr  (z<-z)] 


)  f  1  Z 

+  —p=  fexp  (-.5x2  )  dx 

V2tt  J  V  ' 
y  v  '  -oo 


v 

-.5000 

y. 


(24) 


the  computational  form  of  Equation  (4) . 
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[0196] 


In  a  similar  fashion,  an  estimate  for  p  was  derived 


for  the  discrete  Poisson  distribution.  The  algorithm  of 
Equation  (27)  is  used  for  estimating  p  in  the  small  sample 
Poisson  distribution  test  when  the  mean  k§  >  10  (approximating  a 
symmetric  Gaussian  distribution) . 

[0197]  The  procedural  steps  are  to  first,  consider  any 

observed  y  value  in  Table  6  (which  is  assumed  less  than  or  equal 

to  the  mean  ^ )  to  be  a  lower  limit  called  '  L  .  Second,  assume 
that  the  mean  has  a  theoretical  cumulative  probability, 

I  r  ()  <  Ai})  -  .tOOOO  (or  area  Up  to  the  mean  is  one-half  of  the 

total  area  of  the  probability  mass)  as  in  the  Gaussian  case. 

This  is  a  basic  but  untestable  assumption.  However,  by  linear 
interpolation,  on  the  interval  about  the  mean  [39,  40]  by 
cumulative  percent: 


y 


39  + 


.50000  -.48789 
.55082 -.48789 


(40-39)  =  39.192 


(25) 


the  theoretical  value  of  y  that  is  half-way  in  Table  6. 

[0198]  This  calculation  differs  from  the  actual  Poisson  mean 
k&  =  39.86  by  only  1.7  percent.  Alternate  estimation  procedures 
such  as  geometric  mean  averaging  leads  to  no  more  than  a  1.8 
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percent  difference.  Thus,  the  assumption  appears  to  be 
reasonable  for  large  k  &  values  and  is  used  in  the  estimation 
procedure . 

[0199]  Third,  a  value  is  found  that  is  equidistant  above  the 

mean  y  to  represent  the  upper  and  approximate  equi-probable 

outcome  of  the  experiment  for  a  two-sided  signal  and  noise 
hypothesis.  The  difference  in  the  cumulative  probabilities  of 
y l  and  y u  with  respect  to  the  mean  k&  =  39.86  will  be  the 

estimate  of  p  -  analogous  to  Equation  (24) .  The  values  yL  and  yu 

roughly  symmetric  with  respect  to  the  mean.  The  input  value 
determined  as  y  ^or  y  u  by  comparison  to  k&. 

[0200]  The  lower/upper  y  values  and  p  are  determined  in  the 
following  manner  for  the  case  of  y  either  above  or  at/below  the 
mean  k3 : 
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UPPER  BOUND  DETERMINATION 


y. 


u 


LOWER  BOUND  DETERMINATION 


y , 


y  <k& 
p  =  int  (kS) 


y>k& 

(i  =  int  (k$) 


y  =  yL 

y„  =  p+|p-yj 


y  =  y 


U 


u 

p  =  l- 


—  -  Pr(y  ^  y  L  )  +  Pr(Y  <  y  u  )  -  — 


p  =l-[Pr(Y<yu)-Pr(Y<yL)] 


yL  =  p-|yu  -m| 

p  =  i- 


=l-[Pi(YSyu)-PKYSyL) 


(26) 


where  "int"  is  an  integer  operator  defined  as  the  integer  part 
of  a  real  number  and  where  | • |  is  the  absolute  value.  The  first 
term  |i  =  int(k9-)  will  be  only  the  whole  number.  For  example: 
int  (39.86)=  39  =  pi  .  The  second  t  erm  is  the  absolute  value  of  the 
difference  between  the  mean  fj.  and  the  values  y .  and  y  . 

L*  LI 

V 

[0201]  Applying  the  algorithm  of  Equation  (26)  to  the  case 
for  the  experimental  observed  value  y  =  m  =  39 ;  non-empty 
partitions  in  Table  4 • provide  an  estimate  of  p  in  order  to  test 
the  null  hypothesis  of  noise  only: 
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y  =  m  =  y  L  =  39 
p  =  39 

y^  =  39  (since  |  p -  yL  |  =  o)  (27) 

p  =  1  -  [  .5000  -  Pr  (V  <  39)+  Pr  (y  <  39)-  .5000  ] 

=  1  -  [  Pr  (V<  39)- Pr  (V<  39)]=  1.00 

[0202]  The  probability  that  y  =  m  =39  indicates  noise  is  1.00. 
Using  the  nominal  level  of  significance  a=.05  then  since 
1.00  >  a  =>  NOISE  ;  the  operator  can  conclude  that  this  time 

waveform  contains  virtually  no  Signal  information.  To  validate 
this  value  against  the  Gaussian  calculation,  use  Equation  (16) 
with  m  set  to  y  but  apply  a  quantized  value  of  +.5  to  the 
numerator.  For  example: 


39-££  +  .5 


=  -0.06  . 


(28) 


[0203]  Then,  compute  the  p  value  from  Equation  (4)  for  the 
observed  z-test  value.  This  value  is  found  to  be  p  =  .95 .  If  z 
were  0;  p  would  be  1.00  by  Equation  (4). 

[0204]  The  rule  of  the  present  method  for  applying  the 

quantized  continuity  correction  factor  is:  if  y  —  k§  <  0,  add  +.5- 

and  if  y  —  ^-9  >  0,  add  —.5.  The  algorithm  appears  acceptable  if 
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applied  to  selected  values  of  y  in  Table  6  and  compared  to  the 
Gaussian  distribution. 

[0205]  For  example:  the  likelihood  that  y  was  observed  to  be 

twenty  is, 

y  =  yL=20 
\x  =  39 

y„  =39  +  |  39  —  20 1  =  58  (29) 

p  =  1  -  [  .5000  -  Pr  (K  <  39)  +  Pr  (K <  58)  -  .5000  ] 

=  1  -  [  Pr(  Y  <  58)- Pr  (Y  <  20)]  =  1  -[  .99731  -  .00040]  =  .003 

[0206]  The  probability  is  only  .003  that  the  number  of  non¬ 
empty  partitions  is  twenty  (20) .  This  appears  reasonable  since 
twenty  (20)  is  an  extreme  value  compared  to  the  mean  39.86, 
indicative  of  noise.  The  Gaussian  p  calculation  with  the 
continuity  correction  factor  of  +.5  for  y  =  20  is  the  same  (  p  = 

. 003) . 

[0207]  Another  example:  if  y  =  27,  the  likelihood  of  that 

value  can  estimated  by  the  algorithm: 

Y  =  YL=27 
|i  =  39 

yu  =39  +  |39-27|  =  51  (30) 

p  =  1  -  [.5000  -  Pr  (Y <  27)  +  Pr  (Y  <  5 1)  ■ -  .5000] 

=  l-[Pr  (r<5l)-Pr(r<27)]  =  1  —  [ .963 1 9  —  .02036 ]  =  .057 
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[0208] 


Compared  to  the  Gaussian  calculation  with  the 


continuity  correction  factor  of  +.5  for  y  =  27,  p  =  .05. 

[0209]  Lastly,  if  y  =  53;  the  Poisson  p  can  be  estimated  by 

the  algorithm: 

y  =  v  =53 
J  J  u 

pi  =  39 

y  L  =  39  —  |  53  —  39  |  =  25  (31) 

p  =  1  -  [.5000  -  Pr  (Y <  25)+  Pr  (K <  53)-  .5000  ] 

=  1  -  [Pr(K<53)-Pr  (F<25)]  =  .027 

[0210]  The  p  for  the  Gaussian  is  .046  with  the  continuity 
correction  factor  of  -0.5. 

[0211]  The  algorithm  for  estimating  p  to  test  the  two-tailed 

signal-noise  hypothesis  is  workable  as  validated  against  the 
Gaussian  distribution.  Overall,  the  differences  of  the  p  values 
(Poisson  and  Gaussian)  are  small. 

[0212]  To  further  validate  the  derived  process,  the 

algorithm  in  Equation  (26)  is  applied  to  other  Poisson 
distributions  with  means  as  low  as  10-15.  The  means  of  those 
distributions  differ  from  the  interpolated  means  by  no  more  than 
4.4  percent.  This  percentile  is  an  adequate  tolerance  level 
with  which  to  compute  p  values  to  test  the  signal-noise 
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hypothesis.  When  the  mean  is  approximately  one  hundred  (100) 
then  only  a  0.96  percent  difference  exists.  At  a  mean  of  five 
hundred  (500),  the  difference  is  down  to  .20  percent  and 
continues  to  decrease  with  higher  values  of  the  Poisson  mean. 
[0213]  Consequently,  the  algorithm  is  incorporated  into  the 

present  method  when  the  sample  mean  kS  >  10  regardless  of  the 
sample  size.  When  the  mean  does  not  meet  the  criteria  of 
AS  >  10 ,  the  p  value  cannot  be  estimated  by  Equation  (26)  due  to 

the  high  asymmetry  of  such  distributions.  The  two-tailed 
signal-noise  hypothesis  must  be  evaluated  in  the  standard  manner 
using  the  DECISION  RULE,  under  HYPOTHESIS. 

[0214]  Employing  the  algorithm  of  Equation  (26)  indicates  that 

an  alternate  way  to  assess  the  signal  and  noise  hypothesis 
involves  comparing  the  estimated  p  to  the  approximate  PEA  by  the 
rule  adopted:  p  >  a  =>  NOISE;  p  <  a  =>  SIGNAL  +  NOISE.  That  procedure 
would  result  in  a  faster  solution. 

[0215]  Note  that  in  the  foregoing,  the  data  derived  from  the 
Poisson  frequency  distribution  (Table  4)  of  Method  E  has  been 
used  to  illustrate  the  testing  procedures  for  Method  D.  Those 
computations  provided  forty-eight  (48)  as  the  sample  size  based 
on  the  approximate  procedure  modeled  on  the  formalism  provided 
by  the  Feller  reference  (Chapter  6) .  In  the  alternative,  if 
Method  E  is  not  implemented;  the  operator  may  use  the  actual 
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sample  size  of  fifty  points  (50)  to  carry  out  the  computations 
for  testing  the  signal-noise  hypothesis.  This  has  the  effect  of 
using  the  mean: 


M  =  125 


1  -  exp 


5(P 

125; 


41.21 


(32) 


Compared  to  k&  =  39.86  computed  from  Equation  (20). 

[0216]  That  is,  Table  6  would  be  based  on  41.21  instead  of 
39.86.  All  numerical  results  would  change  slightly,  but  the 
conclusions  will  not  differ  in  regard  to  determining  that  the 
input  time  waveform  is  noise. 

[0217]  In  conclusion,  the  fifty  point  pseudo-random 

distribution  in  Table  4  has  been  analyzed  with  the  testing 
procedures  of  the  present  invention.  Each  test  has  given  the 
same  result  which  is  random  noise. 

[0218]  FIG.  2  and  FIG.  3A  -  3C  diagrammatically  show  the 

steps  of  the  embodiment  of  data  characterization  method  50  and 
the  embodiment  of  data  characterization  method  100. 

[0219]  In  FIG.  2,  in  support  of  the  data  characterization 

method  50,  step  52  provides  a  measurement  input  of  data  based  on 
a  plurality  of  measurements  of  physical  phenomena,  such  as 
sonar,  medical  imaging,  or  the  like. 

[0220]  For  example:  step  52  comprises  reading  input  data 
vectors  {t,y,  z)  where  t  is  clock  time  and  x,  y  are  amplitude  ' 
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measures  in  the  time  domain.  This  step  may  also  comprise 
performing  pre-processing  conditioning,  filtering,  formatting, 
and  selecting  a  discrete  sample  size  N. 

[0221]  Step  54  comprises  forming  a  three-dimensional  convex 
hull  over  the  data.  The  data  is  then  partitioned  into  volumes 
based  on  a  partitioning  algorithm  -  as  previously  described  (See 
Method  1-4) .  The  convex  hull  can  average  approximately  fifty- 
two  percent  of  the  containing  region  formed  by  the  t,  y,  z 
volume;  thereby,  providing  a  significant  increase  in  processing 
speed  as  compared  to  the  prior  art. 

[0222]  In  step  56,  a  determination  is  made  as  to  whether  the 
sample  size  is  large  or  small.  While  presently  preferred 
embodiments  for  this  value  (N  >  25)  have  been  given 
hereinbefore,  it  will  be  understood  that  parameters  can  be 
selected  which  may  vary.  Thus,  if  the  number  of  data  elements 
is  greater  than  a  selected  parameter,  range  of  parameters, 
formulae  based  on  parameters;  then  the  sample  size  is  considered 
large.  If  not,  then  the  sample  size  is  considered  small. 

[0223]  Based  on  the  determination  made  in  step  56,  a  set  of 

tests  are  utilized  for  a  small  sample  size  as  indicated  at  data 
analysis  module  58  or  a  large  sample  as  indicated  at  data 
analysis  module  60.  Decision  module  62  states  that  tests  are 
conducted  and  followed  by  an  "all  or  nothing"  decision  rule.  If 
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all  accepted  tests  indicate  random  noise,  then  that  is  the 
determination.  Otherwise,  the  determination  is  to  be  a  signal 
plus  noise. 

[0224]  For  LARGE  SAMPLE  TEST  MODULE:  N>25  as  indicated  at 
step  56,  then  the  following  tests  comprise  one  possible 
presently  preferred  embodiment  of:  a  Runs  Test;  a  R  Ratio;  a 
Correlation  Module;  a  Normal  Approximate  z-Test/  Confidence 
Interval  (Cl)  Analysis;  a  Chi-square  Test  (alternative) ;  a 
Nonlinear  correlation  and  other  correlation  techniques 
(alternative) . 

[0225]  For  SMALL  SAMPLE  TEST  MODULE:  N  <  25  as  indicated  at 
step  56,  then  the  following  tests  comprise  one  possible 
embodiment:  a  Runs  Test;  a  R  Ratio;  a  Correlation  Module;  and  a 
Poisson  Probability  Test.  One  possible  and  presently  preferred 
order  of  the  testing  protocol  is: 


Large  Sample 

(Data  Analysis  Module  60) 


TEST  ORDER 

TESTING  PROCEDURE 

First 

Wald-Wolfowit z  Runs  test  for  independent 

samples 

.  Normal  approximation 

.  Exact  probability  computation 

Second 

Correlation  Method 
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.  Linear  R 

.  Serial  Correlation  (Autocorrelation) 

.  Correlogram 

Third 

R  Ratio  and  Confidence  Interval  (Cl) 

Analysis 

Fourth 

Normal  Approximations  z  -  Test  for  Poisson 

distribution  on  the  number  of  non-empty 

partitions 

Exact  Poisson  Distribution  Hypothesis 

Test  (alternative) 

alternative 

Chi-square  Test  of  Homogeneity 

•alternative 

Nonlinear  and  other  correlational 

techniques 

NOTE:  Testing  continues  while  either  noise  is  the  current 

decision  or  one  signal  instance  is  detected. 


Small  Sample 

(Data  Analysis  Module  58) 


TEST  ORDER 

TESTING  PROCEDURE 

First 

Wald-Wolf owitz  Runs  test  for  Independent 

Samples  (Exact  test) 
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Second 

Correlation  Method 

.  Linear  R 

.  Serial  Correlation  (Autocorrelation) 

.  Correlogram 

Third 

R  Ratio  and  Confidence  Interval  Analysis 

Fourth 

Exact  Poisson  Distribution  Hypothesis  Test 

.  Standard  Approach 

.  Significance  probability 

NOTE:  Testing  continues  while  either  noise  is  the  current 

decision  or  one  signal  instance  is  detected. 

[0226]  As  indicated  at  step  62  of  the  decision  module;  if  all 
test  results  indicate  noise  then  the  data  is  considered  to  be 
random.  If  any  test  indicates  nonrandom;  then  the  data  contains 
signal  information. 

[0227]  As  indicated  at  step  64,  reports  may  comprise  outputs 

of  analysis  results  in  summary  form,  archiving  (graphics  and 
text)  and  the  next  window  of  data  to  be  processed. 

[0228]  In  regard  to  FIG  3A,  steps  102,  104,  106,  108,  110, 

112,  and  114  of  data  characterization  method  100  correspond  to 
the  previously  discussed  steps  52,  54,  56,  58,  60,  62,  and  64  of 
FIG.  2.  However,  the  characterization  method  100  provides  that 
testing  continues  while  noise  is  the  present  conclusion. 
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[0229]  Referring  to  FIG.  3B  and  3C,  logic  tests  116  are 
provided  after  each  test;  whereby,  if  any  test  determines  that  a 
.signal  is  present  (for  example:  the  data  is  not  random  noise). 

.  At  that  time,  the  testing  is  terminated  with  a  determination  of 
a  signal  by  the  decision  module  112. 

[0230]  Alternatively,  if  all  tests  characterize  the  data  set 
as  noise,  then  the  method  produces  a  characterization  of  the 
data  set  as  noise  as  indicated  at  120  shown  in  FIG.  3B  and  118 
shown  in  FIG.  3C.  As  discussed  hereinbefore  with  the  data 
characterization  method  50;  testing  is  conditional  upon  the  size 
of  the  sample.  The  sample  dictates  the  testing  procedures  used 
to  evaluate  the  signal  and  noise  hypotheses. 

[0231]  One  utility  of  the  present  method  is  in  the  field  of 
signal  processing  and  other  data  processing  fields  in  which  it 
is  of  interest  to  know  whether  the  measurement  structure  is 
random  in  the  presence  of  potentially  highly  sparse  data  sets 
contained  in  a  compact  volume.  The  method  provides  a  faster 
solution  for  randomness  determination  than  prior  art  methods. 
[0232]  A  significant  new  feature  is  an  explicit  method  to 
handle  very  small  samples  by  means  of  a  polygon  envelope; 
thereby,  creating  a  more  concentrated  region  for  analysis.  The. 
calculation  of  the  significance  probability  (as  discussed 
hereinbefore  for  small  samples)  constitutes  another  novel, 
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useful  and  non-obvious  feature.  The  present  inventive  method 
can  most  likely  be  adopted  for  two-dimensional  data  sets  in 
order  to  identify  and  filter  Poisson  noise.  Finally,  a  four¬ 
dimensional  time  series  analysis  can  be  performed  if  measures 
{x,y,z  }  are  captured  at  discrete  time  intervals  t. 

[0233]  Various  alternatives  to  the  above-discussed  methods 

are  possible.  Another  example,  during  partitioning,  another 
step  may  comprise  establishing  a  criterion  for  eliminating  or 
reducing  the  amount  of  non-whole  cubic  subspaces  from  the 
analysis.  For  example:  eliminate  subspace  segments  that  are 
less  than  one-half  to  three-quarters  of  the  size  of  the  volume 
t.  The  sample  size  is  reduced  with  this  criterion.  For  small 
samples,  this  may  lower  the  power  of  the  testing  procedures 
although  the  probability  assumptions  will  be  less  violated. 
[0234]  During  analysis,  a  Chi-square  test  for  homogeneity 

(large  samples)  and  the  nonlinear  correlation  may  be  utilized. 

In  addition,  the  small  sample  Poisson  probability  test  can  be 
used  in  place  of  the  large  sample  approximate  test  since  exact 
probabilities  will  be  provided  regarding  the  signal-noise 
hypothesis.  The  autocorrelation  functions  (ACF)  may  be  computed 
for  as  many  lags  as  possible. 

[0235]  For  non-time  series  data,  many  variable-relational 

techniques  are  known  to  those  skilled  in  the  art,  including 
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multivariate  nonlinear  curve  fitting,  partial  correlation, 
canonical  correlation  analysis,  pattern  recognition,  image 
processing,  feature  extraction,  and  other  multivariate  data 
reduction  techniques.  However,  these  are  large  sample  methods 
that  require  significant  analyst  input  in  the  interpretation  of 
outcomes.  In  the  decision  module,  testing  may  contain  as  many 
procedures  as  needed  to  provide  confidence  to  the  operator  that 
the  data  are  a  noise  or  a  signal. 

[0236]  The  improved  methodology  over  the  prior  art  of  this 

field  can  be  applied  to  the  two-dimensional  case  for  noise 
identification  ahd  filtering.  Moreover,  the  inventive  method 
can  be  applied  to  four-dimensional  structures  with  time  t 
concomitant  with  measures  {  x,  y,  z  }  . 

[0237]  Many  additional  changes  in  the  details,  components, 
steps,  and  organization  of  the  system,  herein  described  and 
illustrated  to  explain  the  nature  of  the  invention,  may  be  made 
by  those  skilled  in  the  art  within  the  principle  and  scope  of 
the  invention.  It  is  therefore  understood  that  within  the  scope 
of  the  appended  claims,  the  invention  may  be  practiced  otherwise 
than  as  specifically  described. 
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METHOD  FOR  DETECTING  A  RANDOM  PROCESS 

IN  A  CONVEX  HULL  VOLUME 


ABSTRACT  OF  THE  DISCLOSURE 

A  method  is  provided  for  characterizing  data  sets 
containing  data  points.  The  method  can  characterize  the  data 
sets  as  random  or  as  non-random.  In  the  method,  a  convex  hull 
envelope  is  constructed  which  contains  the  data  points  and 
passes  through  at  least  four  non-coplanar  data  points.  The 
convex  hull  envelope  is  partitioned  into  cells.  The  method 
classifies  the  data  set  as  a  sized  sample.  Based  on  the  ■ 
classification,  a  predetermined  set  of  tests  is  selected  for 
operating  on  the  data  set. 
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